Re: [HACKERS] LWLock statistics collector (was: CSStorm occurred again by postgreSQL8.2)

2006-08-04 Thread Katsuhiko Okano
Tom Lane wrote:
 I'm confused ... is this patch being proposed for inclusion?  I
 understood your previous message to say that it didn't help much.
This is only the patch for carving where there is any problem.

 The patch is buggy as posted, because it will try to do this:
   if (shared-page_status[bestslot] == SLRU_PAGE_CLEAN)
   return bestslot;
 while bestslot could still be -1.
A check is required. understood.

  (They
 will pick a different buffer, because the guy who got the buffer will
 have done SlruRecentlyUsed on it before releasing the control lock ---
 so I don't believe the worry that we get a buffer thrash scenario here.
 Look at the callers of SlruSelectLRUPage not just the function itself.)
umm,I read a code again.

 otherwise to initiate I/O on the oldest buffer that isn't
 either clean or write-busy, if there is one; 
Understanding is a difficult point although it is important.

Katsuhiko Okano
okano katsuhiko _at_ oss ntt co jp

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] LWLock statistics collector (was: CSStorm occurred again by postgreSQL8.2)

2006-08-03 Thread Tom Lane
Katsuhiko Okano [EMAIL PROTECTED] writes:
 (A) The algorithm which replaces a buffer is bad.
 A time stamp does not become new until swapout completes 
 the swapout page.
 If access is during swap at other pages, the swapout page will be 
 in the state where it is not used most,
 It is again chosen as the page for swapout.
 (When work load is high)

 The following is the patch.

I'm confused ... is this patch being proposed for inclusion?  I
understood your previous message to say that it didn't help much.

The patch is buggy as posted, because it will try to do this:
if (shared-page_status[bestslot] == SLRU_PAGE_CLEAN)
return bestslot;
while bestslot could still be -1.

I see your concern about multiple processes selecting the same buffer
for replacement, but what will actually happen is that all but the first
will block for the first one's I/O to complete using SimpleLruWaitIO,
and then all of them will repeat the outer loop and recheck what to do.
If they were all trying to swap in the same page this is actually
optimal.  If they were trying to swap in different pages then the losing
processes will again try to initiate I/O on a different buffer.  (They
will pick a different buffer, because the guy who got the buffer will
have done SlruRecentlyUsed on it before releasing the control lock ---
so I don't believe the worry that we get a buffer thrash scenario here.
Look at the callers of SlruSelectLRUPage not just the function itself.)

It's possible that letting different processes initiate I/O on different
buffers would be a win, but it might just result in excess writes,
depending on the relative probability of requests for the same page
vs. requests for different pages.

Also, I think the patch as posted would still cause processes to gang up
on the same buffer, it would just be a different one from before.  The
right thing would be to locate the overall-oldest buffer and return it
if clean; otherwise to initiate I/O on the oldest buffer that isn't
either clean or write-busy, if there is one; otherwise just do WaitIO
on the oldest buffer.  This would ensure that different processes try
to push different buffers to disk.  They'd still go back and make their
decisions from the top after doing their I/O.  Whether this is a win or
not is not clear to me, but at least it would attack the guessed-at
problem correctly.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] LWLock statistics collector (was: CSStorm occurred again by postgreSQL8.2)

2006-07-31 Thread Katsuhiko Okano
Hi,All.

Since the cause was found and the provisional patch was made 
and solved about the CSStorm problem in previous mails, it reports.

 Subject: [HACKERS] poor performance with Context Switch Storm at TPC-W.
 Date: Tue, 11 Jul 2006 20:09:24 +0900
 From: Katsuhiko Okano [EMAIL PROTECTED]

 poor performance with Context Switch Storm occurred
 with the following composition.


Premise knowledge :
PostgreSQL8.0 to SAVEPOINT was supported.
All the transactions have one or more subtransactions in an inside.
When judging VISIBILITY of a tupple, XID which inserted the tupple
 needs to judge a top transaction or a subtransaction.
(if it's XMIN committed)
In order to judge, it is necessary to access SubTrans.
(data structure which manages the parents of transaction ID)
SubTrans is accessed via a LRU buffer.


Occurrence conditions of this phenomenon :
The occurrence conditions of this phenomenon are the following.
- There is transaction which refers to the tupple in quantity frequency 
(typically  seq scan).
- (Appropriate frequency) There is updating transaction.
- (Appropriate length) There is long live transaction.


Point of view :
(A) The algorithm which replaces a buffer is bad.
A time stamp does not become new until swapout completes 
the swapout page.
If access is during swap at other pages, the swapout page will be 
in the state where it is not used most,
It is again chosen as the page for swapout.
(When work load is high)

(B) Accessing at every judgment of VISIBILITY of a tupple is frequent.
If many processes wait LWLock using semop, CSStorm will occur.


Result :
As opposed to (A),
I created a patch which the page of read/write IN PROGRESS does not 
make an exchange candidate.
(It has betterslot supposing the case where all the pages are set 
to IN PROGRESS.)
The patch was applied.
However, it recurred. it did not become fundamental solution.

As opposed to (B),
A patch which is changed so that it may consider that all the 
transactions are top transactions was created.
(Thank you, ITAGAKI) The patch was applied. 8 hours was measured.
CSStorm problem was stopped.


Argument :
(1)Since neither SAVEPOINT nor the error trap using PL/pgSQL is done, 
the subtransaction is unnecessary.
Is it better to implement the mode not using a subtransaction?

(2)It is the better if a cache can be carried out by structure 
like CLOG that it seems that it is not necessary to check 
a LRU buffer at every occasion.


Are there a problem and other ideas?

Katsuhiko Okano
okano katsuhiko _at_ oss ntt co jp

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] LWLock statistics collector (was: CSStorm occurred again by postgreSQL8.2)

2006-07-31 Thread Katsuhiko Okano
Katsuhiko Okano wrote:
 Since the cause was found and the provisional patch was made 
 and solved about the CSStorm problem in previous mails, it reports.
(snip)
 (A) The algorithm which replaces a buffer is bad.
 A time stamp does not become new until swapout completes 
 the swapout page.
 If access is during swap at other pages, the swapout page will be 
 in the state where it is not used most,
 It is again chosen as the page for swapout.
 (When work load is high)

The following is the patch.


diff -cpr postgresql-8.1.4-orig/src/backend/access/transam/slru.c 
postgresql-8.1.4-SlruSelectLRUPage-fix/src/backend/access/transam/slru.c

*** postgresql-8.1.4-orig/src/backend/access/transam/slru.c 2006-01-21 
13:38:27.0 +0900

--- postgresql-8.1.4-SlruSelectLRUPage-fix/src/backend/access/transam/slru.c
2006-07-25 18:02:49.0 +0900

*** SlruSelectLRUPage(SlruCtl ctl, int pagen

*** 703,710 

for (;;)

{

int slotno;

!   int bestslot = 0;

unsigned int bestcount = 0;

  

/* See if page already has a buffer assigned */

for (slotno = 0; slotno  NUM_SLRU_BUFFERS; slotno++)

--- 703,712 

for (;;)

{

int slotno;

!   int bestslot = -1;

!   int betterslot = -1;

unsigned int bestcount = 0;

+   unsigned int bettercount = 0;

  

/* See if page already has a buffer assigned */

for (slotno = 0; slotno  NUM_SLRU_BUFFERS; slotno++)

*** SlruSelectLRUPage(SlruCtl ctl, int pagen

*** 720,732 

 */

for (slotno = 0; slotno  NUM_SLRU_BUFFERS; slotno++)

{

!   if (shared-page_status[slotno] == SLRU_PAGE_EMPTY)

!   return slotno;

!   if (shared-page_lru_count[slotno]  bestcount 

!   shared-page_number[slotno] != 
shared-latest_page_number)

!   {

!   bestslot = slotno;

!   bestcount = shared-page_lru_count[slotno];

}

}

  

--- 722,746 

 */

for (slotno = 0; slotno  NUM_SLRU_BUFFERS; slotno++)

{

!   switch (shared-page_status[slotno])

!   {

!   case SLRU_PAGE_EMPTY:

!   return slotno;

!   case SLRU_PAGE_READ_IN_PROGRESS:

!   case SLRU_PAGE_WRITE_IN_PROGRESS:

!   if (shared-page_lru_count[slotno]  
bettercount 

!   shared-page_number[slotno] != 
shared-latest_page_number)

!   {

!   betterslot = slotno;

!   bettercount = 
shared-page_lru_count[slotno];

!   }

!   default:/* 
SLRU_PAGE_CLEAN,SLRU_PAGE_DIRTY */

!   if (shared-page_lru_count[slotno]  
bestcount 

!   shared-page_number[slotno] != 
shared-latest_page_number)

!   {

!   bestslot = slotno;

!   bestcount = 
shared-page_lru_count[slotno];

!   }

}

}

  

*** SlruSelectLRUPage(SlruCtl ctl, int pagen

*** 736,741 

--- 750,758 

if (shared-page_status[bestslot] == SLRU_PAGE_CLEAN)

return bestslot;

  

+   if (bestslot == -1)

+   bestslot = betterslot;

+ 

/*

 * We need to do I/O.  Normal case is that we have to write it 
out,

 * but it's possible in the worst case to have selected a 
read-busy



Regards,

Katsuhiko Okano
okano katsuhiko _at_ oss ntt co jp

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly