Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-06-04 Thread Bruce Momjian

Later version of this patch added to the patch queue.

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Simon Riggs wrote:
 On Sat, 2005-01-01 at 17:47, Simon Riggs wrote:
  On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
   Simon Riggs wrote:

Well, I think we're saying: its not in 8.0 now, and we take our time to
consider patches for 8.1 and accept the situation that the parameter
names/meaning will change in next release.
   
   I have no problem doing something for 8.0 if we can find something that
   meets all the items I mentioned.
   
   One idea would be to just remove bgwriter_percent.  Beta/RC users would
   still have it in their postgresql.conf, but it is commented out so it
   should be OK.  If they uncomment it their server would not start but we
   could just tell testers to remove it.  I see that as better than having
   conflicting parameters.
  
  Can't say I like that at first thought. I'll think some more though...
  
   Another idea is to have bgwriter_percent be the percent of the buffer it
   will scan.  
  
  Hmmmwell that was my original suggestion (bg2.patch on 12 Dec)
  (...though with a bug, as Neil pointed out)
  
   We could default that to 50% or 100%, but we then need to
   make sure all beta/RC users update their postgresql.conf with the new
   default because the commented-out default will not be correct.
  
  ...we just differ/ed on what the default should be...
  
   At this point I see these as our only two viable options, aside from
   doing nothing.
  
   I realize our current behavior requires a full scan of the buffer cache,
   but how often is the bgwriter_maxpages limit met?  If it is not a full
   scan is done anyway, right?  
  
  Well, if you heavy a very heavy read workload then that would be a
  problem. I was more worried about concurrency in a heavy write
  situation, but I can see your point, and agree.
  
  (Idea #1 still suffers from this, so we should rule it out...)
  
   It seems the only way to really add
   functionality is to change bgwriter_precent to control how much of the
   buffer is scanned.
  
  OK. I think you've persuaded me on idea #2, if I understand you right:
  
  bgwriter_percent = 50 (default)
  bgwriter_maxpages = 100 (default)
  
  percent is the number of shared_buffers we scan, limited by maxpages.
  
  (I'll code it up in a couple of hours when the kids are in bed)
 
 Here's the basic patch - no changes to current default values or docs.
 
 Not sure if this is still interesting or not...
 
 -- 
 Best Regards, Simon Riggs

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-07 Thread Marc G. Fournier
On Fri, 7 Jan 2005, Bruce Momjian wrote:
Do we want to add this additional log infor to CVS for 8.0?
No, unless we're looking for an RC5?

---
Simon Riggs wrote:
On Mon, 2005-01-03 at 19:14 -0500, Bruce Momjian wrote:
Simon Riggs wrote:
Here's my bgwriter instrumentation patch, which gives info that could
allow the bgwriter settings to be tuned.
Uh, what does this do exactly?  Add additional logging output?
Produces output like this...
DEBUG:ARC T1target=  45 B1len= 4954 T1len=   40 T2len= 4960 B2len=   46
DEBUG:ARC total   =  98% B1hit=   0% T1hit=   0% T2hit=  98% B2hit=   0%
DEBUG:ARC buffer dirty misses=   22% (wasted=0); cleaned= 4494
when you have debug_shared_buffers (= n) set
and you have server messages DEBUG1 available.
The last line of log output has been replaced by this version.
--
Best Regards, Simon Riggs
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
--
 Bruce Momjian|  http://candle.pha.pa.us
 pgman@candle.pha.pa.us   |  (610) 359-1001
 +  If your life is a hard drive, |  13 Roberts Road
 +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-07 Thread Tom Lane
Marc G. Fournier [EMAIL PROTECTED] writes:
 On Fri, 7 Jan 2005, Bruce Momjian wrote:
 Do we want to add this additional log infor to CVS for 8.0?

 No, unless we're looking for an RC5?

I vote no as well.  While it's probably not a dangerous change, the need
for it has not been demonstrated.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-07 Thread Bruce Momjian
Tom Lane wrote:
 Marc G. Fournier [EMAIL PROTECTED] writes:
  On Fri, 7 Jan 2005, Bruce Momjian wrote:
  Do we want to add this additional log infor to CVS for 8.0?
 
  No, unless we're looking for an RC5?
 
 I vote no as well.  While it's probably not a dangerous change, the need
 for it has not been demonstrated.

OK, Simon, would you email me a copy of the patch again privately so I
can put it in the 8.1 queue.  I seem to have lost the email.  Thanks.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-06 Thread Bruce Momjian

Do we want to add this additional log infor to CVS for 8.0?

---

Simon Riggs wrote:
 On Mon, 2005-01-03 at 19:14 -0500, Bruce Momjian wrote:
  Simon Riggs wrote:
   Here's my bgwriter instrumentation patch, which gives info that could
   allow the bgwriter settings to be tuned.
  
  Uh, what does this do exactly?  Add additional logging output?
  
 
 Produces output like this...
 
 DEBUG:ARC T1target=  45 B1len= 4954 T1len=   40 T2len= 4960 B2len=   46
 DEBUG:ARC total   =  98% B1hit=   0% T1hit=   0% T2hit=  98% B2hit=   0%
 DEBUG:ARC buffer dirty misses=   22% (wasted=0); cleaned= 4494
 
 when you have debug_shared_buffers (= n) set
 and you have server messages DEBUG1 available.
 
 The last line of log output has been replaced by this version.
 
 -- 
 Best Regards, Simon Riggs
 
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-04 Thread Simon Riggs
On Mon, 2005-01-03 at 19:14 -0500, Bruce Momjian wrote:
 Simon Riggs wrote:
  Here's my bgwriter instrumentation patch, which gives info that could
  allow the bgwriter settings to be tuned.
 
 Uh, what does this do exactly?  Add additional logging output?
 

Produces output like this...

DEBUG:ARC T1target=  45 B1len= 4954 T1len=   40 T2len= 4960 B2len=   46
DEBUG:ARC total   =  98% B1hit=   0% T1hit=   0% T2hit=  98% B2hit=   0%
DEBUG:ARC buffer dirty misses=   22% (wasted=0); cleaned= 4494

when you have debug_shared_buffers (= n) set
and you have server messages DEBUG1 available.

The last line of log output has been replaced by this version.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-03 Thread Bruce Momjian

OK, we have a submitted patch that attempts to improve bgwriter by
making bgwriter_percent control what percentage of the buffer is
scanned.

The patch still needs doc changes and a change to the default value but
at this point we need a vote on the patch.  Is it:

* too late for 8.0
* not the right improvement
* to be applied with doc/default additions

Comments?

---

Simon Riggs wrote:
 On Sat, 2005-01-01 at 17:47, Simon Riggs wrote:
  On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
   Simon Riggs wrote:

Well, I think we're saying: its not in 8.0 now, and we take our time to
consider patches for 8.1 and accept the situation that the parameter
names/meaning will change in next release.
   
   I have no problem doing something for 8.0 if we can find something that
   meets all the items I mentioned.
   
   One idea would be to just remove bgwriter_percent.  Beta/RC users would
   still have it in their postgresql.conf, but it is commented out so it
   should be OK.  If they uncomment it their server would not start but we
   could just tell testers to remove it.  I see that as better than having
   conflicting parameters.
  
  Can't say I like that at first thought. I'll think some more though...
  
   Another idea is to have bgwriter_percent be the percent of the buffer it
   will scan.  
  
  Hmmmwell that was my original suggestion (bg2.patch on 12 Dec)
  (...though with a bug, as Neil pointed out)
  
   We could default that to 50% or 100%, but we then need to
   make sure all beta/RC users update their postgresql.conf with the new
   default because the commented-out default will not be correct.
  
  ...we just differ/ed on what the default should be...
  
   At this point I see these as our only two viable options, aside from
   doing nothing.
  
   I realize our current behavior requires a full scan of the buffer cache,
   but how often is the bgwriter_maxpages limit met?  If it is not a full
   scan is done anyway, right?  
  
  Well, if you heavy a very heavy read workload then that would be a
  problem. I was more worried about concurrency in a heavy write
  situation, but I can see your point, and agree.
  
  (Idea #1 still suffers from this, so we should rule it out...)
  
   It seems the only way to really add
   functionality is to change bgwriter_precent to control how much of the
   buffer is scanned.
  
  OK. I think you've persuaded me on idea #2, if I understand you right:
  
  bgwriter_percent = 50 (default)
  bgwriter_maxpages = 100 (default)
  
  percent is the number of shared_buffers we scan, limited by maxpages.
  
  (I'll code it up in a couple of hours when the kids are in bed)
 
 Here's the basic patch - no changes to current default values or docs.
 
 Not sure if this is still interesting or not...
 
 -- 
 Best Regards, Simon Riggs

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-03 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 OK, we have a submitted patch that attempts to improve bgwriter by
 making bgwriter_percent control what percentage of the buffer is
 scanned.

 The patch still needs doc changes and a change to the default value but
 at this point we need a vote on the patch.  Is it:

   * too late for 8.0
   * not the right improvement
   * to be applied with doc/default additions

My vote: too late for 8.0.  There is no hard evidence that this is a
useful improvement, and no time for such evidence to be obtained.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-03 Thread Marc G. Fournier
On Mon, 3 Jan 2005, Bruce Momjian wrote:
OK, we have a submitted patch that attempts to improve bgwriter by
making bgwriter_percent control what percentage of the buffer is
scanned.
The patch still needs doc changes and a change to the default value but
at this point we need a vote on the patch.  Is it:
	* too late for 8.0
Too late by at least 3 RCs ...

Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-03 Thread Simon Riggs
On Mon, 2005-01-03 at 20:09, Bruce Momjian wrote:
 OK, we have a submitted patch that attempts to improve bgwriter by
 making bgwriter_percent control what percentage of the buffer is
 scanned.
 
 The patch still needs doc changes and a change to the default value but
 at this point we need a vote on the patch.  Is it:
 
   * too late for 8.0
   * not the right improvement
   * to be applied with doc/default additions
 
 Comments?
 
 ---
 
 Simon Riggs wrote:
  On Sat, 2005-01-01 at 17:47, Simon Riggs wrote:
   On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
Simon Riggs wrote:
 
 Well, I think we're saying: its not in 8.0 now, and we take our time 
 to
 consider patches for 8.1 and accept the situation that the parameter
 names/meaning will change in next release.


I hear veto ... so the above situation stands then: 8.1 it is.

Not unhappy...I want this thing released as much as the next man...

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-03 Thread Bruce Momjian
Simon Riggs wrote:
 On Mon, 2005-01-03 at 20:09, Bruce Momjian wrote:
  OK, we have a submitted patch that attempts to improve bgwriter by
  making bgwriter_percent control what percentage of the buffer is
  scanned.
  
  The patch still needs doc changes and a change to the default value but
  at this point we need a vote on the patch.  Is it:
  
  * too late for 8.0
  * not the right improvement
  * to be applied with doc/default additions
  
  Comments?
  
  ---
  
  Simon Riggs wrote:
   On Sat, 2005-01-01 at 17:47, Simon Riggs wrote:
On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
 Simon Riggs wrote:
  
  Well, I think we're saying: its not in 8.0 now, and we take our 
  time to
  consider patches for 8.1 and accept the situation that the parameter
  names/meaning will change in next release.
 
 
 I hear veto ... so the above situation stands then: 8.1 it is.
 
 Not unhappy...I want this thing released as much as the next man...

Well, we went through the process and that's the best we can do.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-03 Thread Simon Riggs
On Mon, 2005-01-03 at 23:03, Bruce Momjian wrote:
 Simon Riggs wrote:
  On Mon, 2005-01-03 at 20:09, Bruce Momjian wrote:
   OK, we have a submitted patch that attempts to improve bgwriter by
   making bgwriter_percent control what percentage of the buffer is
   scanned.
   
   The patch still needs doc changes and a change to the default value but
   at this point we need a vote on the patch.  Is it:
   
 * too late for 8.0
 * not the right improvement
 * to be applied with doc/default additions
   
   Comments?
   
   ---
   
   Simon Riggs wrote:
On Sat, 2005-01-01 at 17:47, Simon Riggs wrote:
 On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
  Simon Riggs wrote:
   
   Well, I think we're saying: its not in 8.0 now, and we take our 
   time to
   consider patches for 8.1 and accept the situation that the 
   parameter
   names/meaning will change in next release.
  
  
  I hear veto ... so the above situation stands then: 8.1 it is.
  
  Not unhappy...I want this thing released as much as the next man...
 
 Well, we went through the process and that's the best we can do.

Here's my bgwriter instrumentation patch, which gives info that could
allow the bgwriter settings to be tuned.

-- 
Best Regards, Simon Riggs
Index: src/backend/storage/buffer/bufmgr.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.182
diff -d -c -r1.182 bufmgr.c
*** src/backend/storage/buffer/bufmgr.c	24 Nov 2004 02:56:17 -	1.182
--- src/backend/storage/buffer/bufmgr.c	4 Jan 2005 00:04:18 -
***
*** 440,445 
--- 440,446 
  UnpinBuffer(buf, true);
  inProgress = FALSE;
  buf = NULL;
+ StrategyBufferStatWastedIO();
  			}
  		}
  	} while (buf == NULL);
***
*** 682,687 
--- 683,689 
  	BufferDesc **dirty_buffers;
  	BufferTag  *buftags;
  	int			num_buffer_dirty;
+ 	int			num_buffer_cleaned = 0;
  	int			i;
  
  	/* If either limit is zero then we are disabled from doing anything... */
***
*** 770,775 
--- 772,778 
  
  		TerminateBufferIO(bufHdr, 0);
  		UnpinBuffer(bufHdr, true);
+ num_buffer_cleaned++;
  	}
  
  	LWLockRelease(BufMgrLock);
***
*** 777,782 
--- 780,787 
  	pfree(dirty_buffers);
  	pfree(buftags);
  
+ StrategyBufferStatCleaned(num_buffer_cleaned);
+ 
  	return num_buffer_dirty;
  }
  
Index: src/backend/storage/buffer/freelist.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/freelist.c,v
retrieving revision 1.48
diff -d -c -r1.48 freelist.c
*** src/backend/storage/buffer/freelist.c	16 Sep 2004 16:58:31 -	1.48
--- src/backend/storage/buffer/freelist.c	4 Jan 2005 00:04:18 -
***
*** 115,120 
--- 115,133 
  } while(0)
  
  
+ void
+ StrategyBufferStatWastedIO(void)
+ {
+ StrategyControl-num_wasted++;
+ }
+ 
+ void
+ StrategyBufferStatCleaned(long num_cleaned)
+ {
+ StrategyControl-num_cleaned += num_cleaned;
+ }
+ 
+ 
  /*
   * Printout for use when DebugSharedBuffers is enabled
   */
***
*** 130,159 
  	t1_hit,
  	t2_hit,
  	b2_hit;
- 		int			id,
- 	t1_clean,
- 	t2_clean;
  		ErrorContextCallback *errcxtold;
  
- 		id = StrategyControl-listHead[STRAT_LIST_T1];
- 		t1_clean = 0;
- 		while (id = 0)
- 		{
- 			if (BufferDescriptors[StrategyCDB[id].buf_id].flags  BM_DIRTY)
- break;
- 			t1_clean++;
- 			id = StrategyCDB[id].next;
- 		}
- 		id = StrategyControl-listHead[STRAT_LIST_T2];
- 		t2_clean = 0;
- 		while (id = 0)
- 		{
- 			if (BufferDescriptors[StrategyCDB[id].buf_id].flags  BM_DIRTY)
- break;
- 			t2_clean++;
- 			id = StrategyCDB[id].next;
- 		}
- 
  		if (StrategyControl-num_lookup == 0)
  			all_hit = b1_hit = t1_hit = t2_hit = b2_hit = 0;
  		else
--- 143,150 
***
*** 166,185 
  	  StrategyControl-num_lookup);
  			b2_hit = (StrategyControl-num_hit[STRAT_LIST_B2] * 100 /
  	  StrategyControl-num_lookup);
! 			all_hit = b1_hit + t1_hit + t2_hit + b2_hit;
  		}
  
  		errcxtold = error_context_stack;
  		error_context_stack = NULL;
  		elog(DEBUG1, ARC T1target=%5d B1len=%5d T1len=%5d T2len=%5d B2len=%5d,
  			 T1_TARGET, B1_LENGTH, T1_LENGTH, T2_LENGTH, B2_LENGTH);
! 		elog(DEBUG1, ARC total   =%4ld%% B1hit=%4ld%% T1hit=%4ld%% T2hit=%4ld%% B2hit=%4ld%%,
  			 all_hit, b1_hit, t1_hit, t2_hit, b2_hit);
! 		elog(DEBUG1, ARC clean buffers at LRU   T1=   %5d T2=   %5d,
! 			 t1_clean, t2_clean);
! 		error_context_stack = errcxtold;
  
  		StrategyControl-num_lookup = 0;
  		StrategyControl-num_hit[STRAT_LIST_B1] = 0;
  		StrategyControl-num_hit[STRAT_LIST_T1] = 0;
  		StrategyControl-num_hit[STRAT_LIST_T2] = 0;
--- 157,188 
  	  StrategyControl-num_lookup);

Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-03 Thread Bruce Momjian
Simon Riggs wrote:
 Here's my bgwriter instrumentation patch, which gives info that could
 allow the bgwriter settings to be tuned.

Uh, what does this do exactly?  Add additional logging output?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-03 Thread Bruce Momjian

This has been saved for the 8.1 release:

http:/momjian.postgresql.org/cgi-bin/pgpatches2

---

Simon Riggs wrote:
 On Sat, 2005-01-01 at 17:47, Simon Riggs wrote:
  On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
   Simon Riggs wrote:

Well, I think we're saying: its not in 8.0 now, and we take our time to
consider patches for 8.1 and accept the situation that the parameter
names/meaning will change in next release.
   
   I have no problem doing something for 8.0 if we can find something that
   meets all the items I mentioned.
   
   One idea would be to just remove bgwriter_percent.  Beta/RC users would
   still have it in their postgresql.conf, but it is commented out so it
   should be OK.  If they uncomment it their server would not start but we
   could just tell testers to remove it.  I see that as better than having
   conflicting parameters.
  
  Can't say I like that at first thought. I'll think some more though...
  
   Another idea is to have bgwriter_percent be the percent of the buffer it
   will scan.  
  
  Hmmmwell that was my original suggestion (bg2.patch on 12 Dec)
  (...though with a bug, as Neil pointed out)
  
   We could default that to 50% or 100%, but we then need to
   make sure all beta/RC users update their postgresql.conf with the new
   default because the commented-out default will not be correct.
  
  ...we just differ/ed on what the default should be...
  
   At this point I see these as our only two viable options, aside from
   doing nothing.
  
   I realize our current behavior requires a full scan of the buffer cache,
   but how often is the bgwriter_maxpages limit met?  If it is not a full
   scan is done anyway, right?  
  
  Well, if you heavy a very heavy read workload then that would be a
  problem. I was more worried about concurrency in a heavy write
  situation, but I can see your point, and agree.
  
  (Idea #1 still suffers from this, so we should rule it out...)
  
   It seems the only way to really add
   functionality is to change bgwriter_precent to control how much of the
   buffer is scanned.
  
  OK. I think you've persuaded me on idea #2, if I understand you right:
  
  bgwriter_percent = 50 (default)
  bgwriter_maxpages = 100 (default)
  
  percent is the number of shared_buffers we scan, limited by maxpages.
  
  (I'll code it up in a couple of hours when the kids are in bed)
 
 Here's the basic patch - no changes to current default values or docs.
 
 Not sure if this is still interesting or not...
 
 -- 
 Best Regards, Simon Riggs

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-01 Thread Simon Riggs
On Sat, 2005-01-01 at 06:20, Bruce Momjian wrote:
 This change isn't going to make it for RC3, and it probably not
 something we want to rush.

OK. Thank you.

 I think there are a few issues involved:
 
   o  everyone agrees the current meaning of bgwriter_percent is
  useless (percent of dirty buffers)
   o  removal of bgwriter_percent will cause problems because
  postgresql.conf is only installed via initdb, so beta users
  will have to have some workaround so their existing
  postgresql.conf files work.
   o  bgwriter_percent and bgwriter_maxpages are duplicate for a
  given number of buffers and it isn't clear which one takes
  precedence.
   o  8.1 might use these variables with different meanings,
  causing slight upgrade confusion.
   o  Another idea is for bgwriter_percent to control how much of
  the buffer is scanned.
 

Agreed.

Would add as item #1: current behaviour of bgwriter causes sub-optimal
performance for 8.0, for systems with a high write workload, more CPUs
and higher shared_buffers.

 Tom feels bgwriter_maxpages is good because it allows the user to
 specify the I/O traffic, while bgwriter_percent as total pages (not just
 dirty ones) is perhaps easier to set a default (I/O load varies based on
 buffer cache size) and perhaps easier to understand.
 

Agreed.

 I am not sure what to suggest at this point but whatever solution we use
 should take the above issues into account.

Well, I think we're saying: its not in 8.0 now, and we take our time to
consider patches for 8.1 and accept the situation that the parameter
names/meaning will change in next release.

The patch is there if that decision changes, but I'll say no more on it.

 ---
 
 Simon Riggs wrote:
  On Fri, 2004-12-31 at 01:14, Bruce Momjian wrote:
   Simon Riggs wrote:
On Mon, 2004-12-27 at 22:21, Bruce Momjian wrote:
 Should we consider at least adjusting the meaning of bgwriter_percent?

Yes. As things stand, this is the only change that seems safe.

Here's a very short patch that implements this change within BufferSync
in bufmgr.c 

- No algorithm changes
- No error message changes
- Only change is the call to StrategyDirtyBufferList is made using the
maximum number of buffers that will be cleaned, rather than uselessly
trawling through all of shared_buffers

This changes the meaning of bgwriter_percent from percent of dirty
buffers to percent of shared_buffers. The default settings of 1% of
1000 buffers gives up to 10 dirty block writes every 250ms

Benefit: allows performance tuning by increases options for setting
bgwriter_delay which would otherwise have an ineffectually high minimum
setting

Risk: low

1-line doc patch to follow, if this is approved.
   
   I am not objecting to the patch, but what value is there in having both
   bgwriter_percent and bgwriter_maxpages?  Seems both are redundant and
   that one would be enough.
  
  In brief:
  i) for now: as little change as possible is good
  ii) the two parameters are OK
  iii) trying to decide an alternative takes time, which we do not have
  iv) what is presented here is simply a performance bug fix, not the best
  long term alternative...
  
  I'd like to move quickly: if we do this (or an alternative), it has to
  be done soon and it would be easy to discuss this until we run out of
  time. Could we vote: in RC3, or not?
  
  In more detail... 
  
  The value of having both is:
  i) as little change as possible at this stage of RC - the main one
  ...which gives us stability
  ...and also avoids having to re-discuss what they *should* be
  
  ii) Having two isn't that bad. bgwriter_percent auto adjusts the length
  of the to-be-cleaned-list, so it is roughly useful anywhere between 500
  and 1 shared_buffers. That is IMHO slightly more useful than a hard
  definition set via bgwriter_maxpages, since that is likely to be set
  wrong anyway - but has some value as an outside limit on the number of
  pages. [You may wish to set shared_buffers  1 even on smaller
  servers, since many now have 2GB RAM and yet a relatively poor I/O
  subsystem. Having maxpages set separately allows the majority of people
  to set shared_buffers higher without swamping their I/O subsystems
  because they didn't know about the r8.0 bgwriter feature/parameters]
  
  iii) changing the parameters might tempt us towards changing the
  algorithm, which is not a topic we have reached agreement on
  
  iv) I see it as a goal to remove all of those parameters anyway, as well
  as explore some of the many options and ideas everybody has presented,
  so further change is likely at the next release whatever is done now.
  
  The patch is as simple as I can make it and yet remove the unnecessary
  performance effect in the existing 

Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-01 Thread Bruce Momjian
Simon Riggs wrote:
 On Sat, 2005-01-01 at 06:20, Bruce Momjian wrote:
  This change isn't going to make it for RC3, and it probably not
  something we want to rush.
 
 OK. Thank you.
 
  I think there are a few issues involved:
  
  o  everyone agrees the current meaning of bgwriter_percent is
 useless (percent of dirty buffers)
  o  removal of bgwriter_percent will cause problems because
 postgresql.conf is only installed via initdb, so beta users
 will have to have some workaround so their existing
 postgresql.conf files work.
  o  bgwriter_percent and bgwriter_maxpages are duplicate for a
 given number of buffers and it isn't clear which one takes
 precedence.
  o  8.1 might use these variables with different meanings,
 causing slight upgrade confusion.
  o  Another idea is for bgwriter_percent to control how much of
 the buffer is scanned.
  
 
 Agreed.
 
 Would add as item #1: current behaviour of bgwriter causes sub-optimal
 performance for 8.0, for systems with a high write workload, more CPUs
 and higher shared_buffers.
 
  Tom feels bgwriter_maxpages is good because it allows the user to
  specify the I/O traffic, while bgwriter_percent as total pages (not just
  dirty ones) is perhaps easier to set a default (I/O load varies based on
  buffer cache size) and perhaps easier to understand.
  
 
 Agreed.
 
  I am not sure what to suggest at this point but whatever solution we use
  should take the above issues into account.
 
 Well, I think we're saying: its not in 8.0 now, and we take our time to
 consider patches for 8.1 and accept the situation that the parameter
 names/meaning will change in next release.

I have no problem doing something for 8.0 if we can find something that
meets all the items I mentioned.

One idea would be to just remove bgwriter_percent.  Beta/RC users would
still have it in their postgresql.conf, but it is commented out so it
should be OK.  If they uncomment it their server would not start but we
could just tell testers to remove it.  I see that as better than having
conflicting parameters.

Another idea is to have bgwriter_percent be the percent of the buffer it
will scan.  We could default that to 50% or 100%, but we then need to
make sure all beta/RC users update their postgresql.conf with the new
default because the commented-out default will not be correct.

At this point I see these as our only two viable options, aside from
doing nothing.

I realize our current behavior requires a full scan of the buffer cache,
but how often is the bgwriter_maxpages limit met?  If it is not a full
scan is done anyway, right?  It seems the only way to really add
functionality is to change bgwriter_precent to control how much of the
buffer is scanned.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-01 Thread Simon Riggs
On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
 Simon Riggs wrote:
  
  Well, I think we're saying: its not in 8.0 now, and we take our time to
  consider patches for 8.1 and accept the situation that the parameter
  names/meaning will change in next release.
 
 I have no problem doing something for 8.0 if we can find something that
 meets all the items I mentioned.
 
 One idea would be to just remove bgwriter_percent.  Beta/RC users would
 still have it in their postgresql.conf, but it is commented out so it
 should be OK.  If they uncomment it their server would not start but we
 could just tell testers to remove it.  I see that as better than having
 conflicting parameters.

Can't say I like that at first thought. I'll think some more though...

 Another idea is to have bgwriter_percent be the percent of the buffer it
 will scan.  

Hmmmwell that was my original suggestion (bg2.patch on 12 Dec)
(...though with a bug, as Neil pointed out)

 We could default that to 50% or 100%, but we then need to
 make sure all beta/RC users update their postgresql.conf with the new
 default because the commented-out default will not be correct.

...we just differ/ed on what the default should be...

 At this point I see these as our only two viable options, aside from
 doing nothing.

 I realize our current behavior requires a full scan of the buffer cache,
 but how often is the bgwriter_maxpages limit met?  If it is not a full
 scan is done anyway, right?  

Well, if you heavy a very heavy read workload then that would be a
problem. I was more worried about concurrency in a heavy write
situation, but I can see your point, and agree.

(Idea #1 still suffers from this, so we should rule it out...)

 It seems the only way to really add
 functionality is to change bgwriter_precent to control how much of the
 buffer is scanned.

OK. I think you've persuaded me on idea #2, if I understand you right:

bgwriter_percent = 50 (default)
bgwriter_maxpages = 100 (default)

percent is the number of shared_buffers we scan, limited by maxpages.

(I'll code it up in a couple of hours when the kids are in bed)

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-01 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
   o  everyone agrees the current meaning of bgwriter_percent is
  useless (percent of dirty buffers)

Oh?

It's not useless by any means; it's a perfectly reasonable and useful
definition that happens to be expensive to implement.  One of the
questions that is not answered to my satisfaction is what is an adequate
substitute that doesn't lose needed functionality.

   o  bgwriter_percent and bgwriter_maxpages are duplicate for a
  given number of buffers and it isn't clear which one takes
  precedence.

Not unless the current definition of bgwriter_percent is changed.

Please try to make sure that your summaries reduce confusion instead
of increasing it.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-01 Thread Simon Riggs
On Sat, 2005-01-01 at 17:47, Simon Riggs wrote:
 On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote:
  Simon Riggs wrote:
   
   Well, I think we're saying: its not in 8.0 now, and we take our time to
   consider patches for 8.1 and accept the situation that the parameter
   names/meaning will change in next release.
  
  I have no problem doing something for 8.0 if we can find something that
  meets all the items I mentioned.
  
  One idea would be to just remove bgwriter_percent.  Beta/RC users would
  still have it in their postgresql.conf, but it is commented out so it
  should be OK.  If they uncomment it their server would not start but we
  could just tell testers to remove it.  I see that as better than having
  conflicting parameters.
 
 Can't say I like that at first thought. I'll think some more though...
 
  Another idea is to have bgwriter_percent be the percent of the buffer it
  will scan.  
 
 Hmmmwell that was my original suggestion (bg2.patch on 12 Dec)
 (...though with a bug, as Neil pointed out)
 
  We could default that to 50% or 100%, but we then need to
  make sure all beta/RC users update their postgresql.conf with the new
  default because the commented-out default will not be correct.
 
 ...we just differ/ed on what the default should be...
 
  At this point I see these as our only two viable options, aside from
  doing nothing.
 
  I realize our current behavior requires a full scan of the buffer cache,
  but how often is the bgwriter_maxpages limit met?  If it is not a full
  scan is done anyway, right?  
 
 Well, if you heavy a very heavy read workload then that would be a
 problem. I was more worried about concurrency in a heavy write
 situation, but I can see your point, and agree.
 
 (Idea #1 still suffers from this, so we should rule it out...)
 
  It seems the only way to really add
  functionality is to change bgwriter_precent to control how much of the
  buffer is scanned.
 
 OK. I think you've persuaded me on idea #2, if I understand you right:
 
 bgwriter_percent = 50 (default)
 bgwriter_maxpages = 100 (default)
 
 percent is the number of shared_buffers we scan, limited by maxpages.
 
 (I'll code it up in a couple of hours when the kids are in bed)

Here's the basic patch - no changes to current default values or docs.

Not sure if this is still interesting or not...

-- 
Best Regards, Simon Riggs
Index: src/backend/storage/buffer/bufmgr.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.182
diff -d -c -r1.182 bufmgr.c
*** src/backend/storage/buffer/bufmgr.c	24 Nov 2004 02:56:17 -	1.182
--- src/backend/storage/buffer/bufmgr.c	1 Jan 2005 21:03:16 -
***
*** 682,717 
  	BufferDesc **dirty_buffers;
  	BufferTag  *buftags;
  	int			num_buffer_dirty;
  	int			i;
  
  	/* If either limit is zero then we are disabled from doing anything... */
  	if (percent == 0 || maxpages == 0)
  		return 0;
  
  	/*
! 	 * Get a list of all currently dirty buffers and how many there are.
  	 * We do not flush buffers that get dirtied after we started. They
! 	 * have to wait until the next checkpoint.
  	 */
! 	dirty_buffers = (BufferDesc **) palloc(NBuffers * sizeof(BufferDesc *));
! 	buftags = (BufferTag *) palloc(NBuffers * sizeof(BufferTag));
  
  	LWLockAcquire(BufMgrLock, LW_EXCLUSIVE);
- 	num_buffer_dirty = StrategyDirtyBufferList(dirty_buffers, buftags,
- 			   NBuffers);
  
! 	/*
! 	 * If called by the background writer, we are usually asked to only
! 	 * write out some portion of dirty buffers now, to prevent the IO
! 	 * storm at checkpoint time.
! 	 */
! 	if (percent  0)
! 	{
! 		Assert(percent = 100);
! 		num_buffer_dirty = (num_buffer_dirty * percent + 99) / 100;
! 	}
! 	if (maxpages  0  num_buffer_dirty  maxpages)
! 		num_buffer_dirty = maxpages;
  
  	/* Make sure we can handle the pin inside the loop */
  	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
--- 682,728 
  	BufferDesc **dirty_buffers;
  	BufferTag  *buftags;
  	int			num_buffer_dirty;
+ int max_buffer_dirty = 1;
+ int max_buffer_scan = 1;
  	int			i;
  
  	/* If either limit is zero then we are disabled from doing anything... */
  	if (percent == 0 || maxpages == 0)
  		return 0;
  
+ /* Set number of buffers we will scan from LRUs of buffer lists */
+ if (percent  0 ) {
+ 	Assert(percent = 100);
+	max_buffer_scan = (NBuffers * percent + 99) / 100;
+ }
+ 
+ /* at checkpoint time we scan the whole buffer list */
+ if (percent  0)
+ 	max_buffer_scan = NBuffers;
+ 
+ if (maxpages  0 || maxpages  NBuffers)
+ 	max_buffer_dirty = NBuffers;
+ else
+ max_buffer_dirty = maxpages;
+ 
+ /* we cannot find more dirty buffers than we scan */
+ if (max_buffer_dirty  max_buffer_scan)
+ max_buffer_dirty = max_buffer_scan;
+ 
  	/*
! 	 * Get a list of dirty buffers to clean and how 

Re: [PATCHES] [HACKERS] Bgwriter behavior

2005-01-01 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  o  everyone agrees the current meaning of bgwriter_percent is
 useless (percent of dirty buffers)
 
 Oh?
 
 It's not useless by any means; it's a perfectly reasonable and useful
 definition that happens to be expensive to implement.  One of the
 questions that is not answered to my satisfaction is what is an adequate
 substitute that doesn't lose needed functionality.

I remembered this statement:

 I think there's a reasonable case to be made for redefining
 bgwriter_percent as the max percent of the total buffer list to scan
 (not the max percent of the list to return --- Jan correctly pointed out
 that the latter is useless).  Then we could modify
 StrategyDirtyBufferList so that the percent and maxpages parameters are
 passed in, so it can stop as soon as either one is satisfied.  This
 would be a fairly small/safe code change and I wouldn't have a problem
 doing it even at this late stage of the cycle.

Referenced here:

http://archives.postgresql.org/pgsql-hackers/2004-12/msg00703.php

But I now see that Jan was objecting to the idea of the previouis patch
where bgwriter_percent is a percent of all buffers to return, which we
just discussed as redundant.

  o  bgwriter_percent and bgwriter_maxpages are duplicate for a
 given number of buffers and it isn't clear which one takes
 precedence.
 
 Not unless the current definition of bgwriter_percent is changed.
 
 Please try to make sure that your summaries reduce confusion instead
 of increasing it.

OK, whatever.  My point is that many have critisized the current
behavior of bgwriter_percent and I haven't heard anyone defend it,
including Jan.

What bothers me is that we have known bgwriter needs tuning for months
and I am not sure we are any closer to improving it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Bgwriter behavior

2004-12-31 Thread Simon Riggs
On Fri, 2004-12-31 at 01:14, Bruce Momjian wrote:
 Simon Riggs wrote:
  On Mon, 2004-12-27 at 22:21, Bruce Momjian wrote:
   Should we consider at least adjusting the meaning of bgwriter_percent?
  
  Yes. As things stand, this is the only change that seems safe.
  
  Here's a very short patch that implements this change within BufferSync
  in bufmgr.c 
  
  - No algorithm changes
  - No error message changes
  - Only change is the call to StrategyDirtyBufferList is made using the
  maximum number of buffers that will be cleaned, rather than uselessly
  trawling through all of shared_buffers
  
  This changes the meaning of bgwriter_percent from percent of dirty
  buffers to percent of shared_buffers. The default settings of 1% of
  1000 buffers gives up to 10 dirty block writes every 250ms
  
  Benefit: allows performance tuning by increases options for setting
  bgwriter_delay which would otherwise have an ineffectually high minimum
  setting
  
  Risk: low
  
  1-line doc patch to follow, if this is approved.
 
 I am not objecting to the patch, but what value is there in having both
 bgwriter_percent and bgwriter_maxpages?  Seems both are redundant and
 that one would be enough.

In brief:
i) for now: as little change as possible is good
ii) the two parameters are OK
iii) trying to decide an alternative takes time, which we do not have
iv) what is presented here is simply a performance bug fix, not the best
long term alternative...

I'd like to move quickly: if we do this (or an alternative), it has to
be done soon and it would be easy to discuss this until we run out of
time. Could we vote: in RC3, or not?

In more detail... 

The value of having both is:
i) as little change as possible at this stage of RC - the main one
...which gives us stability
...and also avoids having to re-discuss what they *should* be

ii) Having two isn't that bad. bgwriter_percent auto adjusts the length
of the to-be-cleaned-list, so it is roughly useful anywhere between 500
and 1 shared_buffers. That is IMHO slightly more useful than a hard
definition set via bgwriter_maxpages, since that is likely to be set
wrong anyway - but has some value as an outside limit on the number of
pages. [You may wish to set shared_buffers  1 even on smaller
servers, since many now have 2GB RAM and yet a relatively poor I/O
subsystem. Having maxpages set separately allows the majority of people
to set shared_buffers higher without swamping their I/O subsystems
because they didn't know about the r8.0 bgwriter feature/parameters]

iii) changing the parameters might tempt us towards changing the
algorithm, which is not a topic we have reached agreement on

iv) I see it as a goal to remove all of those parameters anyway, as well
as explore some of the many options and ideas everybody has presented,
so further change is likely at the next release whatever is done now.

The patch is as simple as I can make it and yet remove the unnecessary
performance effect in the existing code. Thanks to Neil and others for
showing that this was possible...I see this patch as a team effort.

I've already spoken against larger change and would do so again now: if
we don't agree this change, then I would vote for no-change simply
because this patch is minimal change. We *suspect* further change is
beneficial but we have no evidence to support what that change should
be, amongst the large range of possible solutions proposed.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] Bgwriter behavior

2004-12-31 Thread Bruce Momjian

This change isn't going to make it for RC3, and it probably not
something we want to rush.

I think there are a few issues involved:

o  everyone agrees the current meaning of bgwriter_percent is
   useless (percent of dirty buffers)
o  removal of bgwriter_percent will cause problems because
   postgresql.conf is only installed via initdb, so beta users
   will have to have some workaround so their existing
   postgresql.conf files work.
o  bgwriter_percent and bgwriter_maxpages are duplicate for a
   given number of buffers and it isn't clear which one takes
   precedence.
o  8.1 might use these variables with different meanings,
   causing slight upgrade confusion.
o  Another idea is for bgwriter_percent to control how much of
   the buffer is scanned.

Tom feels bgwriter_maxpages is good because it allows the user to
specify the I/O traffic, while bgwriter_percent as total pages (not just
dirty ones) is perhaps easier to set a default (I/O load varies based on
buffer cache size) and perhaps easier to understand.

I am not sure what to suggest at this point but whatever solution we use
should take the above issues into account.

---

Simon Riggs wrote:
 On Fri, 2004-12-31 at 01:14, Bruce Momjian wrote:
  Simon Riggs wrote:
   On Mon, 2004-12-27 at 22:21, Bruce Momjian wrote:
Should we consider at least adjusting the meaning of bgwriter_percent?
   
   Yes. As things stand, this is the only change that seems safe.
   
   Here's a very short patch that implements this change within BufferSync
   in bufmgr.c 
   
   - No algorithm changes
   - No error message changes
   - Only change is the call to StrategyDirtyBufferList is made using the
   maximum number of buffers that will be cleaned, rather than uselessly
   trawling through all of shared_buffers
   
   This changes the meaning of bgwriter_percent from percent of dirty
   buffers to percent of shared_buffers. The default settings of 1% of
   1000 buffers gives up to 10 dirty block writes every 250ms
   
   Benefit: allows performance tuning by increases options for setting
   bgwriter_delay which would otherwise have an ineffectually high minimum
   setting
   
   Risk: low
   
   1-line doc patch to follow, if this is approved.
  
  I am not objecting to the patch, but what value is there in having both
  bgwriter_percent and bgwriter_maxpages?  Seems both are redundant and
  that one would be enough.
 
 In brief:
 i) for now: as little change as possible is good
 ii) the two parameters are OK
 iii) trying to decide an alternative takes time, which we do not have
 iv) what is presented here is simply a performance bug fix, not the best
 long term alternative...
 
 I'd like to move quickly: if we do this (or an alternative), it has to
 be done soon and it would be easy to discuss this until we run out of
 time. Could we vote: in RC3, or not?
 
 In more detail... 
 
 The value of having both is:
 i) as little change as possible at this stage of RC - the main one
 ...which gives us stability
 ...and also avoids having to re-discuss what they *should* be
 
 ii) Having two isn't that bad. bgwriter_percent auto adjusts the length
 of the to-be-cleaned-list, so it is roughly useful anywhere between 500
 and 1 shared_buffers. That is IMHO slightly more useful than a hard
 definition set via bgwriter_maxpages, since that is likely to be set
 wrong anyway - but has some value as an outside limit on the number of
 pages. [You may wish to set shared_buffers  1 even on smaller
 servers, since many now have 2GB RAM and yet a relatively poor I/O
 subsystem. Having maxpages set separately allows the majority of people
 to set shared_buffers higher without swamping their I/O subsystems
 because they didn't know about the r8.0 bgwriter feature/parameters]
 
 iii) changing the parameters might tempt us towards changing the
 algorithm, which is not a topic we have reached agreement on
 
 iv) I see it as a goal to remove all of those parameters anyway, as well
 as explore some of the many options and ideas everybody has presented,
 so further change is likely at the next release whatever is done now.
 
 The patch is as simple as I can make it and yet remove the unnecessary
 performance effect in the existing code. Thanks to Neil and others for
 showing that this was possible...I see this patch as a team effort.
 
 I've already spoken against larger change and would do so again now: if
 we don't agree this change, then I would vote for no-change simply
 because this patch is minimal change. We *suspect* further change is
 beneficial but we have no evidence to support what that change should
 be, amongst the large range of possible solutions proposed.
 
 -- 
 Best Regards, Simon Riggs
 
 
 ---(end of broadcast)---
 TIP 2: you can get 

Re: [HACKERS] Bgwriter behavior

2004-12-30 Thread Simon Riggs
On Mon, 2004-12-27 at 22:21, Bruce Momjian wrote:
 Should we consider at least adjusting the meaning of bgwriter_percent?

Yes. As things stand, this is the only change that seems safe.

Here's a very short patch that implements this change within BufferSync
in bufmgr.c 

- No algorithm changes
- No error message changes
- Only change is the call to StrategyDirtyBufferList is made using the
maximum number of buffers that will be cleaned, rather than uselessly
trawling through all of shared_buffers

This changes the meaning of bgwriter_percent from percent of dirty
buffers to percent of shared_buffers. The default settings of 1% of
1000 buffers gives up to 10 dirty block writes every 250ms

Benefit: allows performance tuning by increases options for setting
bgwriter_delay which would otherwise have an ineffectually high minimum
setting

Risk: low

1-line doc patch to follow, if this is approved.

-- 
Best Regards, Simon Riggs
Index: src/backend/storage/buffer/bufmgr.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.182
diff -d -c -r1.182 bufmgr.c
*** src/backend/storage/buffer/bufmgr.c	24 Nov 2004 02:56:17 -	1.182
--- src/backend/storage/buffer/bufmgr.c	30 Dec 2004 23:52:24 -
***
*** 681,686 
--- 681,687 
  {
  	BufferDesc **dirty_buffers;
  	BufferTag  *buftags;
+ int dirty_buffers_maxlen = 1;
  	int			num_buffer_dirty;
  	int			i;
  
***
*** 688,717 
  	if (percent == 0 || maxpages == 0)
  		return 0;
  
  	/*
! 	 * Get a list of all currently dirty buffers and how many there are.
  	 * We do not flush buffers that get dirtied after we started. They
! 	 * have to wait until the next checkpoint.
  	 */
! 	dirty_buffers = (BufferDesc **) palloc(NBuffers * sizeof(BufferDesc *));
! 	buftags = (BufferTag *) palloc(NBuffers * sizeof(BufferTag));
  
  	LWLockAcquire(BufMgrLock, LW_EXCLUSIVE);
- 	num_buffer_dirty = StrategyDirtyBufferList(dirty_buffers, buftags,
- 			   NBuffers);
  
! 	/*
! 	 * If called by the background writer, we are usually asked to only
! 	 * write out some portion of dirty buffers now, to prevent the IO
! 	 * storm at checkpoint time.
! 	 */
! 	if (percent  0)
! 	{
! 		Assert(percent = 100);
! 		num_buffer_dirty = (num_buffer_dirty * percent + 99) / 100;
! 	}
! 	if (maxpages  0  num_buffer_dirty  maxpages)
! 		num_buffer_dirty = maxpages;
  
  	/* Make sure we can handle the pin inside the loop */
  	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
--- 689,719 
  	if (percent == 0 || maxpages == 0)
  		return 0;
  
+ /* Set number of buffers we will clean at LRUs of buffer lists */
+ if (percent  0 ) {
+ 	Assert(percent = 100);
+	dirty_buffers_maxlen = (NBuffers * percent + 99) / 100;
+ }
+ 	if (maxpages  0  dirty_buffers_maxlen  maxpages)
+ 	dirty_buffers_maxlen = maxpages;
+ 
+ /* if checkpoint time */
+ if (percent == -1  maxpages == -1)
+ 	dirty_buffers_maxlen = NBuffers;
+ 
  	/*
! 	 * Get a list of dirty buffers to clean and how many there are.
  	 * We do not flush buffers that get dirtied after we started. They
! 	 * have to wait until the next call of this function
  	 */
! 	dirty_buffers = 
!  (BufferDesc **) palloc(dirty_buffers_maxlen * sizeof(BufferDesc *));
! 	buftags = (BufferTag *) palloc(dirty_buffers_maxlen * sizeof(BufferTag));
  
  	LWLockAcquire(BufMgrLock, LW_EXCLUSIVE);
  
!	num_buffer_dirty = StrategyDirtyBufferList(dirty_buffers, buftags,
! 			   dirty_buffers_maxlen);
  
  	/* Make sure we can handle the pin inside the loop */
  	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Bgwriter behavior

2004-12-30 Thread Bruce Momjian
Simon Riggs wrote:
 On Mon, 2004-12-27 at 22:21, Bruce Momjian wrote:
  Should we consider at least adjusting the meaning of bgwriter_percent?
 
 Yes. As things stand, this is the only change that seems safe.
 
 Here's a very short patch that implements this change within BufferSync
 in bufmgr.c 
 
 - No algorithm changes
 - No error message changes
 - Only change is the call to StrategyDirtyBufferList is made using the
 maximum number of buffers that will be cleaned, rather than uselessly
 trawling through all of shared_buffers
 
 This changes the meaning of bgwriter_percent from percent of dirty
 buffers to percent of shared_buffers. The default settings of 1% of
 1000 buffers gives up to 10 dirty block writes every 250ms
 
 Benefit: allows performance tuning by increases options for setting
 bgwriter_delay which would otherwise have an ineffectually high minimum
 setting
 
 Risk: low
 
 1-line doc patch to follow, if this is approved.

I am not objecting to the patch, but what value is there in having both
bgwriter_percent and bgwriter_maxpages?  Seems both are redundant and
that one would be enough.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Bgwriter behavior

2004-12-29 Thread Manfred Koizar
[I know I'm late and this has already been discussed by Richrad, Tom,
et al., but ...]

On Tue, 21 Dec 2004 16:17:17 -0600, Jim C. Nasby
[EMAIL PROTECTED] wrote:
look at where the last page you wrote out has ended up in the LRU list
since you last ran, and start scanning from there (by definition
everything after that page would have to be clean).

This is a bit oversimplified, because that page will be moved to the
start of the list when it is accessed the next time.

  A = B = C = D = E = F = G = H = I = J = K = L = m = n = o = p = q
  ^
would become

  M = A = B = C = D = E = F = G = H = I = J = K = L = n = o = p = q
  ^

(a-z ... known to be clean, A-Z ... possibly dirty)

But with a bit of cooperation from the backends this could be made to
work.  Whenever a backend takes the page which is the start of the
clean tail out of the list (most probably to insert it into another
list or to re-insert it at the start of the same list) the clean tail
pointer is advanced to the next list element, if any.  So we would get

  M = A = B = C = D = E = F = G = H = I = J = K = L = n = o = p = q
  ^

As a little improvement the clean tail could be prevented from
shrinking unnecessarily fast by moving the pointer to the previous
list element if this is found to be clean:

  M = A = B = C = D = E = F = G = H = I = J = K = l = n = o = p = q
  ^

Maybe this approach could serve both goals, (1) keeping a number of
clean pages at the LRU end of the list and (2) writing out other dirty
pages if there's not much to do near the end of the list.

But ...
On Tue, 21 Dec 2004 10:26:48 -0500, Tom Lane [EMAIL PROTECTED]
wrote:
Also, the cntxDirty mechanism allows a block to be dirtied without
changing the ARC state at all.

... which might kill this proposal anyway.

Servus
 Manfred


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Bgwriter behavior

2004-12-28 Thread Bruce Momjian

Added to TODO:

* Improve the background writer

  Allow the background writer to more efficiently write dirty buffers
  from the end of the LRU cache and use a clock sweep algorithm to
  write other dirty buffers to reduced checkpoint I/O


---

Simon Riggs wrote:
 On Wed, 2004-12-22 at 04:43, Tom Lane wrote:
  Bruce Momjian pgman@candle.pha.pa.us writes:
   So what are we doing for 8.0?
  
  Well, it looks like RC2 has already crashed and burned --- I can't
  imagine that Marc will let us release without an RC3 given what was
  committed today, never mind the btree bug that Mark Wong seems to have
  found.  So maybe we should just bite the bullet and do something real
  about this.
  
  I'm willing to code up a proposed patch for the two-track idea I
  suggested, and if anyone else has a favorite maybe they could write
  something too.  But do we have the resources to test such patches and
  make a decision in the next few days?
  
  At the moment my inclination is to sit on what we have.  I've not seen
  any indication that 8.0 is really worse than earlier releases; the most
  you could argue against it is that it's not as much better as we hoped.
  That's not grounds to muck around at the RC3 stage.
 
 Agreed, if somewhat reluctantly.
 
 We may have the time to test, but it is clear that we do not have the
 time to validate those tests, then discuss and agree on the results.
 
 Time to go with what we have.
 
 [Mark's possible bug seems a higher priority for me.]
 
 -- 
 Best Regards, Simon Riggs
 
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Bgwriter behavior

2004-12-28 Thread Bruce Momjian
John Hansen wrote:
  I ran some tests last week and can report results similar on Tom's test:
  
  pgbench -i -s 10 bench
  pgbench -c 10 -t 1 bench
  
  The tests were on a machine with a single SCSI drive that doesn't lie
  about fsync.  I found 7.4.X got around 75tps while 8.0 got 100tps, very
  similar to the 65/107 numbers Tom had.
 
 You do realize, that pgbench result comparisons are about as useful as a
 fork for eating soup?


 
 On another note, how do you know for sure, that your drive does not lie
 about fsync?

 
 Did you run the tests with fsync turned off vs fsync on?

I just tried and got 115tps with fsync off vs 100 with fsync on, so
fsync is certainly doing something.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Bgwriter behavior

2004-12-28 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 John Hansen wrote:
 On another note, how do you know for sure, that your drive does not lie
 about fsync?

 I just tried and got 115tps with fsync off vs 100 with fsync on, so
 fsync is certainly doing something.

[ raised eyebrow... ]  Something is wrong with that.  I'd expect a
*much* higher difference.  It's difficult to credit a tps rate higher
than your disk's RPM rating with fsync on, but most modern CPUs can do
a lot better than that with fsync off.  If you have a 7200 RPM drive
then I'd believe the 100 figure, but not the other ...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Bgwriter behavior

2004-12-28 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  John Hansen wrote:
  On another note, how do you know for sure, that your drive does not lie
  about fsync?
 
  I just tried and got 115tps with fsync off vs 100 with fsync on, so
  fsync is certainly doing something.
 
 [ raised eyebrow... ]  Something is wrong with that.  I'd expect a
 *much* higher difference.  It's difficult to credit a tps rate higher
 than your disk's RPM rating with fsync on, but most modern CPUs can do
 a lot better than that with fsync off.  If you have a 7200 RPM drive
 then I'd believe the 100 figure, but not the other ...

I think it is a 10k RPM drive, Seagate Cheteetah ST336607LW.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Bgwriter behavior

2004-12-28 Thread Simon Riggs
On Tue, 2004-12-28 at 07:23, John Hansen wrote:
  I ran some tests last week and can report results similar on Tom's test:
  
  pgbench -i -s 10 bench
  pgbench -c 10 -t 1 bench
  
  The tests were on a machine with a single SCSI drive that doesn't lie
  about fsync.  I found 7.4.X got around 75tps while 8.0 got 100tps, very
  similar to the 65/107 numbers Tom had.
 
 You do realize, that pgbench result comparisons are about as useful as a
 fork for eating soup?

I'd have to agree. I find it hard to get comparable results on my test
server, let alone discuss other people's findings.

The only tests I have reasonable faith in these days are those performed
to a rigorous test method, which is also published, visible and
challengeable. OSDL is the nearest thing to that we have to that.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Bgwriter behavior

2004-12-28 Thread John Hansen
   I ran some tests last week and can report results similar on Tom's test:
   
 pgbench -i -s 10 bench
 pgbench -c 10 -t 1 bench
   

don't you have to specify the scaling factor for the benchmark as well?
as in pgbench -c 10 -t 1 -s 10 bench ?

 I just tried and got 115tps with fsync off vs 100 with fsync on, so
 fsync is certainly doing something.

well, I usually get results that differ by that much from run to run.
Probably you ran in to more checkpoints on the second test.

Also, did you reinitialize the bench database with pgbench -i ?

... John


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Bgwriter behavior

2004-12-28 Thread Bruce Momjian
John Hansen wrote:
I ran some tests last week and can report results similar on Tom's test:

pgbench -i -s 10 bench
pgbench -c 10 -t 1 bench

 
 don't you have to specify the scaling factor for the benchmark as well?
 as in pgbench -c 10 -t 1 -s 10 bench ?
 
  I just tried and got 115tps with fsync off vs 100 with fsync on, so
  fsync is certainly doing something.
 
 well, I usually get results that differ by that much from run to run.
 Probably you ran in to more checkpoints on the second test.
 
 Also, did you reinitialize the bench database with pgbench -i ?

I destroyed the database and recreated it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Bgwriter behavior

2004-12-28 Thread Mark Kirkwood

Bruce Momjian wrote:
well, I usually get results that differ by that much from run to run.
Probably you ran in to more checkpoints on the second test.
Also, did you reinitialize the bench database with pgbench -i ?
   

I destroyed the database and recreated it.
 

The only way I managed to control the variability in Pgbench was to 
*reboot the machine* and recreate the database for each test. In 
addition it seems that using a larger scale factor (e.g 200) helped as well.

Having said that, on FreeBSD 5.3 with hw.ata.wc=0 (i.e no write cache) 
my results for s=200, t=1 and c=4 were 49  (+/- 0.5) tps for both 
7.4.6 and 8.0.0RC1 - no measurable difference. If I  reduced the number 
of transactions to t=1000, then 7.4.6 jumped ahead by about 10 tps.

Bruce - are you able to try s=200? It would be interesting to see what 
your setup does.

regards
Mark
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Bgwriter behavior

2004-12-27 Thread Bruce Momjian
Simon Riggs wrote:
 On Wed, 2004-12-22 at 04:43, Tom Lane wrote:
  Bruce Momjian pgman@candle.pha.pa.us writes:
   So what are we doing for 8.0?
  
  Well, it looks like RC2 has already crashed and burned --- I can't
  imagine that Marc will let us release without an RC3 given what was
  committed today, never mind the btree bug that Mark Wong seems to have
  found.  So maybe we should just bite the bullet and do something real
  about this.
  
  I'm willing to code up a proposed patch for the two-track idea I
  suggested, and if anyone else has a favorite maybe they could write
  something too.  But do we have the resources to test such patches and
  make a decision in the next few days?
  
  At the moment my inclination is to sit on what we have.  I've not seen
  any indication that 8.0 is really worse than earlier releases; the most
  you could argue against it is that it's not as much better as we hoped.
  That's not grounds to muck around at the RC3 stage.
 
 Agreed, if somewhat reluctantly.
 
 We may have the time to test, but it is clear that we do not have the
 time to validate those tests, then discuss and agree on the results.
 
 Time to go with what we have.

I ran some tests last week and can report results similar on Tom's test:

pgbench -i -s 10 bench
pgbench -c 10 -t 1 bench

The tests were on a machine with a single SCSI drive that doesn't lie
about fsync.  I found 7.4.X got around 75tps while 8.0 got 100tps, very
similar to the 65/107 numbers Tom had.

First, I am confused why we have such a large improvement in 8.0.  Does
anyone know?  This is a pretty long test so a 33-50% increase is a big
jump.

Second, I added a little code in my local code to check if the
pendingOpsTable overflows and register_dirty_segment() must have a local
backend do an fsync().  I found one bgbench test had 54 local fsyncs,
but the next test had none, and 54 isn't a very larger number.

Should we emit a server log message when this happens so they can
reduce bewriter delay?

It seems having the backend do the writes is not so bad (same as 7.4.X)
and our only big problem with current bgwriter is the inability to
reduce checkpoint load for busy servers.

Should we consider at least adjusting the meaning of bgwriter_percent?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Bgwriter behavior

2004-12-27 Thread John Hansen
 I ran some tests last week and can report results similar on Tom's test:
 
   pgbench -i -s 10 bench
   pgbench -c 10 -t 1 bench
 
 The tests were on a machine with a single SCSI drive that doesn't lie
 about fsync.  I found 7.4.X got around 75tps while 8.0 got 100tps, very
 similar to the 65/107 numbers Tom had.

You do realize, that pgbench result comparisons are about as useful as a
fork for eating soup?

On another note, how do you know for sure, that your drive does not lie
about fsync?

Did you run the tests with fsync turned off vs fsync on?

 First, I am confused why we have such a large improvement in 8.0.  Does
 anyone know?  This is a pretty long test so a 33-50% increase is a big
 jump.

bgwriter is responsible I imagine,... I experienced the same improvement
in an early 7.5, just after the bgwriter was added.
(tho my results was about 4-5 times higher in terms of tps rates, hehe)

... John


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Bgwriter behavior

2004-12-23 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 I remember the other difference between 8.0 and pre-8.0.  When a backend
 has to write a block in 8.0, it does a write _plus_ fsync(), while in
 pre-8.0 it did only a write.  There was a proposal to pass backend write
 information to the background writer so it would know to fsync at
 checkpoint, but it was decided that backend writing would be rare.  I
 think we have to rethink that assumption.

No, just read the code.  The above assertions are all wet.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Bgwriter behavior

2004-12-23 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  I remember the other difference between 8.0 and pre-8.0.  When a backend
  has to write a block in 8.0, it does a write _plus_ fsync(), while in
  pre-8.0 it did only a write.  There was a proposal to pass backend write
  information to the background writer so it would know to fsync at
  checkpoint, but it was decided that backend writing would be rare.  I
  think we have to rethink that assumption.
 
 No, just read the code.  The above assertions are all wet.

Oh, I forgot you added that array to pass fsync info.

Shouldn't we send a log message when the array gets full in md.c:

{
if (ForwardFsyncRequest(reln-smgr_rnode, seg-mdfd_segno))
return true;
}

if (FileSync(seg-mdfd_vfd)  0)
return false;

Seems that could fill up quickly.  I see no checking for existing
matching records in the array.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Bgwriter behavior

2004-12-23 Thread Simon Riggs
On Wed, 2004-12-22 at 04:43, Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  So what are we doing for 8.0?
 
 Well, it looks like RC2 has already crashed and burned --- I can't
 imagine that Marc will let us release without an RC3 given what was
 committed today, never mind the btree bug that Mark Wong seems to have
 found.  So maybe we should just bite the bullet and do something real
 about this.
 
 I'm willing to code up a proposed patch for the two-track idea I
 suggested, and if anyone else has a favorite maybe they could write
 something too.  But do we have the resources to test such patches and
 make a decision in the next few days?
 
 At the moment my inclination is to sit on what we have.  I've not seen
 any indication that 8.0 is really worse than earlier releases; the most
 you could argue against it is that it's not as much better as we hoped.
 That's not grounds to muck around at the RC3 stage.

Agreed, if somewhat reluctantly.

We may have the time to test, but it is clear that we do not have the
time to validate those tests, then discuss and agree on the results.

Time to go with what we have.

[Mark's possible bug seems a higher priority for me.]

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Bgwriter behavior

2004-12-22 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  So what are we doing for 8.0?
 
 Well, it looks like RC2 has already crashed and burned --- I can't
 imagine that Marc will let us release without an RC3 given what was
 committed today, never mind the btree bug that Mark Wong seems to have
 found.  So maybe we should just bite the bullet and do something real
 about this.
 
 I'm willing to code up a proposed patch for the two-track idea I
 suggested, and if anyone else has a favorite maybe they could write
 something too.  But do we have the resources to test such patches and
 make a decision in the next few days?
 
 At the moment my inclination is to sit on what we have.  I've not seen
 any indication that 8.0 is really worse than earlier releases; the most
 you could argue against it is that it's not as much better as we hoped.
 That's not grounds to muck around at the RC3 stage.

I remember the other difference between 8.0 and pre-8.0.  When a backend
has to write a block in 8.0, it does a write _plus_ fsync(), while in
pre-8.0 it did only a write.  There was a proposal to pass backend write
information to the background writer so it would know to fsync at
checkpoint, but it was decided that backend writing would be rare.  I
think we have to rethink that assumption.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] Bgwriter behavior

2004-12-21 Thread Bruce Momjian
Tom Lane wrote:
 Gavin Sherry [EMAIL PROTECTED] writes:
  I was also thinking of benchmarking the effect of changing the algorithm
  in StrategyDirtyBufferList(): currently, for each iteration of the loop we
  read a buffer from each of T1 and T2. I was wondering what effect reading
  T1 first then T2 and vice versa would have on performance.
 
 Looking at StrategyGetBuffer, it definitely seems like a good idea to
 try to keep the bottom end of both T1 and T2 lists clean.  But we should
 work at T1 a bit harder.
 
 The insight I take away from today's discussion is that there are two
 separate goals here: try to keep backends that acquire a buffer via
 StrategyGetBuffer from being fed a dirty buffer they have to write,
 and try to keep the next upcoming checkpoint from having too much work
 to do.  Those are both laudable goals but I hadn't really seen before
 that they may require different strategies to achieve.  I'm liking the
 idea that bgwriter should alternate between doing writes in pursuit of
 the one goal and doing writes in pursuit of the other.

It seems we have added a new limitation to bgwriter by not doing a full
scan.  With a full scan we could easily grab the first X pages starting
from the end of the LRU list and write them.  By not scanning the full
list we are opening the possibility of not seeing some of the front-most
LRU dirty pages.  And the full scan was removed so we can run bgwriter
more frequently, but we might end up with other problems.

I have a new proposal.  The idea is to cause bgwriter to increase its
frequency based on how quickly it finds dirty pages.

First, we remove the GUC bgwriter_maxpages because I don't see a good
way to set a default for that.  A default value needs to be based on a
percentage of the full buffer cache size.  Second, we make
bgwriter_percent cause the bgwriter to stop its scan once it has found a
number of dirty buffers that matches X% of the buffer cache size.  So,
if it is set to 5%, the bgwriter scan stops once it find enough dirty
buffers to equal 5% of the buffer cache size. 

Bgwriter continues to scan starting from the end of the LRU list, just
like it does now.

Now, to control the bgwriter frequency we multiply the percent of the
list it had to span by the bgwriter_delay value to determine when to run
bgwriter next.  For example, if you find enough dirty pages by looking
at only 10% of the buffer cache you multiple 10% (0.10) * bgwriter_delay
and that is when you run next.  If you have to scan 50%, bgwriter runs
next at 50% (0.50) * bgwriter_delay, and if it has to scan the entire
list it is 100% (1.00) * bgwriter_delay.

What this does is to cause bgwriter to run more frequently when there
are a lot of dirty buffers on the end of the LRU _and_ when the bgwriter
scan will be quick.  When there are few writes, bgwriter will run less
frequently but will write dirty buffers nearer to the head of the LRU.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Bgwriter behavior

2004-12-21 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 First, we remove the GUC bgwriter_maxpages because I don't see a good
 way to set a default for that.  A default value needs to be based on a
 percentage of the full buffer cache size.

This is nonsense.  The admin knows what he set shared_buffers to, and so
maxpages and percent of shared buffers are not really distinct ways of
specifying things.  The cases that make a percent spec useful are if
(a) it is a percent of a non-constant number (eg, percent of total dirty
pages as in the current code), or (b) it is defined in a way that lets
it limit the amount of scanning work done (which it isn't useful for in
the current code).  But a maxpages spec is useful for (b) too.  More to
the point, maxpages is useful to set a hard limit on the amount of I/O
generated by the bgwriter, and I think people will want to be able to do
that.

 Now, to control the bgwriter frequency we multiply the percent of the
 list it had to span by the bgwriter_delay value to determine when to run
 bgwriter next.

I'm less than enthused about this.  The idea of the bgwriter is to
trickle out writes in a way that doesn't affect overall performance too
much.  Not to write everything in sight at any cost.

I like the hybrid keep the bottom of the ARC list clean, plus do a slow
clock scan on the main buffer array approach better.  I can see that
that directly impacts both of the goals that the bgwriter has.  I don't
see how a variable I/O rate really improves life on either score; it
just makes things harder to predict.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Bgwriter behavior

2004-12-21 Thread Jim C. Nasby
A quick $0.02 on how DB2 does this (at least in 7.x).

They used a combination of everything that's been discussed. The first
priority of their background writer was to keep the LRU end of the cache
free so individual backends would never have to wait to get a page.
Then, they would look to pages that had been dirty for 'a long time',
which was user configurable. Pages older than this setting were
candidates to be written out even if they weren't close to LRU. Finally,
I believe there were also settings for how often the writer would fire
up, and how much work it would do at once.

I agree that the first priority should be to keep clean pages near LRU,
but that you also don't want to get hammered at checkpoint time. I think
what might be interesting to consider is keeping a list of dirty pages,
which would remove the need to scan a very large buffer. Of course, in
an environment with a heavy update load, it could be better to just
scan the buffers, especially if you don't do a clock-sweep but instead
look at where the last page you wrote out has ended up in the LRU list
since you last ran, and start scanning from there (by definition
everything after that page would have to be clean). Of course this is
just conjecture on my part and would need testing to verify, and it's
obviously beyond the scope of 8.0.

As for 8.0, I suspect at this point it's probably best to just go with
whatever method has the smallest amount of code impact unless it's
inherenttly broken.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Bgwriter behavior

2004-12-21 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  First, we remove the GUC bgwriter_maxpages because I don't see a good
  way to set a default for that.  A default value needs to be based on a
  percentage of the full buffer cache size.
 
 This is nonsense.  The admin knows what he set shared_buffers to, and so
 maxpages and percent of shared buffers are not really distinct ways of
 specifying things.  The cases that make a percent spec useful are if
 (a) it is a percent of a non-constant number (eg, percent of total dirty
 pages as in the current code), or (b) it is defined in a way that lets
 it limit the amount of scanning work done (which it isn't useful for in
 the current code).  But a maxpages spec is useful for (b) too.  More to
 the point, maxpages is useful to set a hard limit on the amount of I/O
 generated by the bgwriter, and I think people will want to be able to do
 that.

I figured that if we specify a percentage users would not need to update
this value regularly if they increase their shared buffers.  I agree if
you want to limit total I/O by the bgwriter an actual pages a count is
better but I assumed we were looking for bgwriter to do a certain
percentage of total writes.  If the system is doing a lot of writes then
limiting the bgwriter doesn't help because then the backends are going
to have to do the writes themselves.

  Now, to control the bgwriter frequency we multiply the percent of the
  list it had to span by the bgwriter_delay value to determine when to run
  bgwriter next.
 
 I'm less than enthused about this.  The idea of the bgwriter is to
 trickle out writes in a way that doesn't affect overall performance too
 much.  Not to write everything in sight at any cost.

No question my idea makes tuning diffcult.  I was hoping it would be
self-tuning but I am not sure.

 I like the hybrid keep the bottom of the ARC list clean, plus do a slow
 clock scan on the main buffer array approach better.  I can see that
 that directly impacts both of the goals that the bgwriter has.  I don't
 see how a variable I/O rate really improves life on either score; it
 just makes things harder to predict.

So what are we doing for 8.0?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Bgwriter behavior

2004-12-21 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 So what are we doing for 8.0?

Well, it looks like RC2 has already crashed and burned --- I can't
imagine that Marc will let us release without an RC3 given what was
committed today, never mind the btree bug that Mark Wong seems to have
found.  So maybe we should just bite the bullet and do something real
about this.

I'm willing to code up a proposed patch for the two-track idea I
suggested, and if anyone else has a favorite maybe they could write
something too.  But do we have the resources to test such patches and
make a decision in the next few days?

At the moment my inclination is to sit on what we have.  I've not seen
any indication that 8.0 is really worse than earlier releases; the most
you could argue against it is that it's not as much better as we hoped.
That's not grounds to muck around at the RC3 stage.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Bgwriter behavior

2004-12-21 Thread Joshua D. Drake

At the moment my inclination is to sit on what we have.  I've not seen
any indication that 8.0 is really worse than earlier releases; the most
you could argue against it is that it's not as much better as we hoped.
That's not grounds to muck around at the RC3 stage.
 

If is is any help, CMD is basically dead right now and I expect
it will be that way until the new year. 4 of my 5 C programmers
are on vacation but I do have one and a couple of non c programmers.
We can't fix, but we can definately help test.
Sincerely,
Joshua D. Drake

regards, tom lane
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match
 


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
begin:vcard
fn:Joshua Drake
n:Drake;Joshua
org:Command Prompt, Inc.
adr:;;PO Box 215 ;Cascade Locks;OR;97014;US
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
x-mozilla-html:FALSE
url:http://www.commandprompt.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Bgwriter behavior

2004-12-21 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  So what are we doing for 8.0?
 
 Well, it looks like RC2 has already crashed and burned --- I can't
 imagine that Marc will let us release without an RC3 given what was
 committed today, never mind the btree bug that Mark Wong seems to have
 found.  So maybe we should just bite the bullet and do something real
 about this.

Oh, is it that bad?

 I'm willing to code up a proposed patch for the two-track idea I
 suggested, and if anyone else has a favorite maybe they could write
 something too.  But do we have the resources to test such patches and
 make a decision in the next few days?
 
 At the moment my inclination is to sit on what we have.  I've not seen
 any indication that 8.0 is really worse than earlier releases; the most
 you could argue against it is that it's not as much better as we hoped.
 That's not grounds to muck around at the RC3 stage.

That was my question.  It seems bgwriter is fine for low to medium
traffic but doesn't handle high traffic, and increasing the scan rate
makes things worse.

I am fine with doing nothing, but if we are going to do something, I
would like to do it now rather than later.

The only way I could see it being worse than pre-8.0 is that the
bgwriter is doing fsync of all open files rather than using sync. Other
than that, I think it should behave the same, or slightly better, 
right?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Bgwriter behavior

2004-12-21 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 The only way I could see it being worse than pre-8.0 is that the
 bgwriter is doing fsync of all open files rather than using sync. Other
 than that, I think it should behave the same, or slightly better, 
 right?

It's possible that there exist platforms on which this is a loss ---
that is, the OS's handling of fsync is so inefficient that multiple
fsync calls are worse than one sync call even though less I/O is forced.
But I haven't seen any actual evidence of that; and if such platforms
do exist I'm not sure I'd blink anyway.  We are not required to optimize
for brain-dead kernels.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html