[PATCHES] Change BgWriterCommLock to spinlock

2006-01-08 Thread Qingqing Zhou

The following patch changes BgWriterCommLock to a spinlock. To confirm to
the spinlock coding rules(though not needed since we are in critical
section), rewrite the requests array into a circular one, which will
reduce the lock time when the bgwriter absorbs the requests. A
byproduct-benefit is that we can avoid the critial sections in
AbsorbFsyncRequests() since we only removes the requests when it is done.

The concurrency control of the circular array relies on the single-reader
fact, but I don't see there is any need in future to change it.

Regards,
Qingqing

---

Index: backend/postmaster/bgwriter.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/bgwriter.c,v
retrieving revision 1.22
diff -c -r1.22 bgwriter.c
*** backend/postmaster/bgwriter.c   8 Dec 2005 19:19:22 -   1.22
--- backend/postmaster/bgwriter.c   8 Jan 2006 03:36:51 -
***
*** 56,61 
--- 56,62 
  #include storage/ipc.h
  #include storage/pmsignal.h
  #include storage/smgr.h
+ #include storage/spin.h
  #include tcop/tcopprot.h
  #include utils/guc.h
  #include utils/memutils.h
***
*** 94,101 
   * reasonable, it should set this field TRUE just before sending the signal.
   *
   * The requests array holds fsync requests sent by backends and not yet
!  * absorbed by the bgwriter.  Unlike the checkpoint fields, the requests
!  * fields are protected by BgWriterCommLock.
   *--
   */
  typedef struct
--- 95,101 
   * reasonable, it should set this field TRUE just before sending the signal.
   *
   * The requests array holds fsync requests sent by backends and not yet
!  * absorbed by the bgwriter.
   *--
   */
  typedef struct
***
*** 115,126 

sig_atomic_t ckpt_time_warn;/* warn if too soon since last ckpt? */

!   int num_requests;   /* current # of requests */
!   int max_requests;   /* allocated array size */
BgWriterRequest requests[1];/* VARIABLE LENGTH ARRAY */
  } BgWriterShmemStruct;

! static BgWriterShmemStruct *BgWriterShmem;

  /*
   * GUC parameters
--- 115,131 

sig_atomic_t ckpt_time_warn;/* warn if too soon since last ckpt? */

!   /* The following implements a circular array */
!   slock_t req_lock;   /* protects the array */
!   int req_oldest; /* start of 
requests */
!   int req_newest; /* end of 
requests */
!   int num_requests;   /* current number of 
requests */
!   int max_requests;   /* allocated array size 
*/
BgWriterRequest requests[1];/* VARIABLE LENGTH ARRAY */
  } BgWriterShmemStruct;

! /* use volatile pointer to prevent code rearrangement */
! static volatile BgWriterShmemStruct *BgWriterShmem;

  /*
   * GUC parameters
***
*** 528,535 
--- 533,542 
if (found)
return; /* already initialized 
*/

+   /* Init the circular array */
MemSet(BgWriterShmem, 0, sizeof(BgWriterShmemStruct));
BgWriterShmem-max_requests = NBuffers;
+   SpinLockInit(BgWriterShmem-req_lock);
  }

  /*
***
*** 638,660 
  bool
  ForwardFsyncRequest(RelFileNode rnode, BlockNumber segno)
  {
!   BgWriterRequest *request;

if (!IsUnderPostmaster)
return false;   /* probably shouldn't even get 
here */
Assert(BgWriterShmem != NULL);

!   LWLockAcquire(BgWriterCommLock, LW_EXCLUSIVE);
if (BgWriterShmem-bgwriter_pid == 0 ||
BgWriterShmem-num_requests = BgWriterShmem-max_requests)
{
!   LWLockRelease(BgWriterCommLock);
return false;
}
!   request = BgWriterShmem-requests[BgWriterShmem-num_requests++];
request-rnode = rnode;
request-segno = segno;
!   LWLockRelease(BgWriterCommLock);
return true;
  }

--- 645,673 
  bool
  ForwardFsyncRequest(RelFileNode rnode, BlockNumber segno)
  {
!   volatile BgWriterRequest *request;

if (!IsUnderPostmaster)
return false;   /* probably shouldn't even get 
here */
Assert(BgWriterShmem != NULL);

!   SpinLockAcquire(BgWriterShmem-req_lock);
if (BgWriterShmem-bgwriter_pid == 0 ||
BgWriterShmem-num_requests = BgWriterShmem-max_requests)
{
!   SpinLockRelease(BgWriterShmem-req_lock);
return false;
}
!   request = BgWriterShmem-requests[BgWriterShmem-req_newest++];
!   BgWriterShmem-num_requests++;
request-rnode = rnode;
request-segno = segno;
!
!   /* handle wrap around */
!   if (BgWriterShmem-req_newest = BgWriterShmem-max_requests)
!   

Re: [PATCHES] Change BgWriterCommLock to spinlock

2006-01-08 Thread Tom Lane
Qingqing Zhou [EMAIL PROTECTED] writes:
 The following patch changes BgWriterCommLock to a spinlock.

Why is this a good idea?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Change BgWriterCommLock to spinlock

2006-01-08 Thread Qingqing Zhou


On Sun, 8 Jan 2006, Tom Lane wrote:

 Why is this a good idea?


In spirit of incremental improvement:
(1) The spinlock itself are light weight than the LWLock here and we can
reduce the lock contention a little bit in AbsorbFsyncRequests();
(2) Don't need the CRITICAL SECTION in AbsorbFsyncRequests() any more;

Regards,
Qingqing

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Change BgWriterCommLock to spinlock

2006-01-08 Thread Tom Lane
Qingqing Zhou [EMAIL PROTECTED] writes:
 On Sun, 8 Jan 2006, Tom Lane wrote:
 Why is this a good idea?

 In spirit of incremental improvement:
 (1) The spinlock itself are light weight than the LWLock here and we can
 reduce the lock contention a little bit in AbsorbFsyncRequests();

Spinlock-based coding is inherently much more fragile than LWLock-based
coding.  I'm against changing things in that direction unless a
substantial performance improvement can be gained.  You didn't offer
any evidence of improvement at all.

 (2) Don't need the CRITICAL SECTION in AbsorbFsyncRequests() any more;

Really?  I think this coding still breaks, rather badly, if
RememberFsyncRequest fails.  Certainly the reasons for needing a
critical section have nothing to do with what kind of lock is used.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Change BgWriterCommLock to spinlock

2006-01-08 Thread Qingqing Zhou


On Sun, 8 Jan 2006, Tom Lane wrote:

  (1) The spinlock itself are light weight than the LWLock here and we
  can reduce the lock contention a little bit in AbsorbFsyncRequests();

 Spinlock-based coding is inherently much more fragile than LWLock-based
 coding.  I'm against changing things in that direction unless a
 substantial performance improvement can be gained.  You didn't offer
 any evidence of improvement at all.

Yeah, only theoretically there is some marginal performance improvements.
Maybe you suggest we keep the LWLock but use the circular array part?

  (2) Don't need the CRITICAL SECTION in AbsorbFsyncRequests() any more;

 Really?  I think this coding still breaks, rather badly, if
 RememberFsyncRequest fails.  Certainly the reasons for needing a
 critical section have nothing to do with what kind of lock is used.

Yeah, not related to lock. But I changed algorithm to circular array as
well and notice there is only one reader, so we can remove the requests
after the we are successfully done. In another word, if there is problem,
we never remove the unhanlded requests.

Regards,
Qingqing


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Change BgWriterCommLock to spinlock

2006-01-08 Thread Tom Lane
Qingqing Zhou [EMAIL PROTECTED] writes:
 Yeah, only theoretically there is some marginal performance improvements.
 Maybe you suggest we keep the LWLock but use the circular array part?

They're separable issues anyway.

 Yeah, not related to lock. But I changed algorithm to circular array as
 well and notice there is only one reader, so we can remove the requests
 after the we are successfully done. In another word, if there is problem,
 we never remove the unhanlded requests.

I doubt that's more robust than before though.  If we did run out of
memory in the hashtable, this coding would essentially block further
operation of the request queue, because it would keep trying to
re-insert all the entries in the queue, and keep failing.

If you want the bgwriter to keep working in the face of an out-of-memory
condition in the hashtable, I think you'd have to change the coding so
that it takes requests one at a time from the queue.  Then, as
individual requests get removed from the hashtable, individual new
requests could get put in, even if the total queue is too large to fit
all at once.  However this would result in much greater lock-acquisition
overhead, so it doesn't really seem like a win.  We haven't seen any
evidence that the current coding has a problem under field conditions.

Another issue to keep in mind is that correct operation requires that
the bgwriter not declare a checkpoint complete until it's completed
every fsync request that was queued before the checkpoint started.
So if the bgwriter is to try to keep going after failing to absorb
all the pending requests, there would have to be some logic addition
to keep track of whether it's OK to complete a checkpoint or not.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Change BgWriterCommLock to spinlock

2006-01-08 Thread Qingqing Zhou


On Sun, 8 Jan 2006, Tom Lane wrote:

 If you want the bgwriter to keep working in the face of an out-of-memory
 condition in the hashtable, I think you'd have to change the coding so
 that it takes requests one at a time from the queue.

Patched version will issue ERROR instead of PANIC at this condition, so
the bgwriter can still keep running. I don't quite understand what do you
mean by want the bgwriter keep working -- do you mean by not issuing an
ERROR but do retry? An ERROR is not avoidable unless we change the
out-of-memory handling logic inside hashtable.


 Another issue to keep in mind is that correct operation requires that
 the bgwriter not declare a checkpoint complete until it's completed
 every fsync request that was queued before the checkpoint started.
 So if the bgwriter is to try to keep going after failing to absorb
 all the pending requests, there would have to be some logic addition
 to keep track of whether it's OK to complete a checkpoint or not.

As above, if bgwriter fails to absorb, it will quite the job (and
checkpoint will not be finished).

Do you suggest it makes sense that we continue to work on the patch or let
it be?

Regards,
Qingqing

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Change BgWriterCommLock to spinlock

2006-01-08 Thread Tom Lane
Qingqing Zhou [EMAIL PROTECTED] writes:
 Do you suggest it makes sense that we continue to work on the patch or let
 it be?

I don't see any problem here that urgently needs solving.  If we ever
see any reports of out-of-memory failures in the bgwriter, then it'll
be time to worry about this, but I think it quite unlikely that we
ever will.  (Even if we do, a simpler answer would be to increase
the initial size of the pending-requests hashtable.)

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Change BgWriterCommLock to spinlock

2006-01-08 Thread Qingqing Zhou


On Sun, 8 Jan 2006, Tom Lane wrote:

 I don't see any problem here that urgently needs solving.  If we ever
 see any reports of out-of-memory failures in the bgwriter, then it'll
 be time to worry about this, but I think it quite unlikely that we
 ever will.  (Even if we do, a simpler answer would be to increase
 the initial size of the pending-requests hashtable.)


Agreed,

Regards
Qingqing

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster