Re: CSStorm occurred again by postgreSQL8.2. (Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.)

2006-07-19 Thread Katsuhiko Okano
"Tom Lane <[EMAIL PROTECTED]>" wrote:
> Katsuhiko Okano <[EMAIL PROTECTED]> writes:
> > It does not solve, even if it increases the number of NUM_SUBTRANS_BUFFERS.
> > The problem was only postponed.
> 
> Can you provide a reproducible test case for this?

Seven machines are required in order to perform measurement.
(DB*1,AP*2,CLient*4)
Enough work load was not able to be given in two machines.
(DB*1,{AP+CL}*1)


It was not able to reappear to a multiplex run of pgbench 
or a simple SELECT query.
TPC-W of a work load tool used this time is a full scratch.
Regrettably it cannot open to the public.
If there is a work load tool of a free license, I would like to try.


I will show if there is information required for others.


The patch which outputs the number of times of LWLock was used this time.
The following is old example output. FYI.

# SELECT * FROM pg_stat_lwlocks;
 kind |  pg_stat_get_lwlock_name   |  sh_call   |  sh_wait  |  ex_call  |  
ex_wait  | sleep 

--+++---+---+---+---

0 | BufMappingLock |  559375542 | 33542 |320092 | 
24025 | 0

1 | BufFreelistLock|  0 | 0 |370709 |   
 47 | 0

2 | LockMgrLock|  0 | 0 |  41718885 |
734502 | 0

3 | OidGenLock | 33 | 0 | 0 |   
  0 | 0

4 | XidGenLock |   12572279 | 10095 |  11299469 | 
20089 | 0

5 | ProcArrayLock  |8371330 | 72052 |  16965667 |
603294 | 0

6 | SInvalLock |   38822428 |   435 | 25917 |   
128 | 0

7 | FreeSpaceLock  |  0 | 0 | 16787 |   
  4 | 0

8 | WALInsertLock  |  0 | 0 |   1239911 |   
885 | 0

9 | WALWriteLock   |  0 | 0 | 69907 |  
5589 | 0

   10 | ControlFileLock|  0 | 0 | 16686 |   
  1 | 0

   11 | CheckpointLock |  0 | 0 |34 |   
  0 | 0

   12 | CheckpointStartLock|  69509 | 0 |34 |   
  1 | 0

   13 | CLogControlLock|  0 | 0 |236763 |   
183 | 0

   14 | SubtransControlLock|  0 | 0 | 753773945 | 
205273395 | 0

   15 | MultiXactGenLock   | 66 | 0 | 0 |   
  0 | 0

   16 | MultiXactOffsetControlLock |  0 | 0 |35 |   
  0 | 0

   17 | MultiXactMemberControlLock |  0 | 0 |34 |   
  0 | 0

   18 | RelCacheInitLock   |  0 | 0 | 0 |   
  0 | 0

   19 | BgWriterCommLock   |  0 | 0 | 61457 |   
  1 | 0

   20 | TwoPhaseStateLock  | 33 | 0 | 0 |   
  0 | 0

   21 | TablespaceCreateLock   |  0 | 0 | 0 |   
  0 | 0

   22 | BufferIO   |  0 | 0 |695627 |   
 16 | 0

   23 | BufferContent  | 3568231805 |  1897 |   1361394 |   
829 | 0

   24 | CLog   |  0 | 0 | 0 |   
  0 | 0

   25 | SubTrans   |  138571621 | 143208883 |   8122181 |   
8132646 | 0

   26 | MultiXactOffset|  0 | 0 | 0 |   
  0 | 0

   27 | MultiXactMember|  0 | 0 | 0 |   
  0 | 0

(28 rows)


I am pleased if interested.



regards,

Katsuhiko Okano
okano katsuhiko _at_ oss ntt co jp

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: CSStorm occurred again by postgreSQL8.2. (Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.)

2006-07-18 Thread Tom Lane
Katsuhiko Okano <[EMAIL PROTECTED]> writes:
> It does not solve, even if it increases the number of NUM_SUBTRANS_BUFFERS.
> The problem was only postponed.

Can you provide a reproducible test case for this?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


CSStorm occurred again by postgreSQL8.2. (Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.)

2006-07-18 Thread Katsuhiko Okano
Katsuhiko Okano wrote:
> By PostgreSQL8.2, NUM_SUBTRANS_BUFFERS was changed into 128
> and recompile and measured again.
> NOT occurrence of CSStorm. The value of WIPS was about 400.

measured again.
not occurrence when measured for 30 minutes.
but occurrence when measured for 3 hours, and 1 hour and 10 minutes passed.
It does not solve, even if it increases the number of NUM_SUBTRANS_BUFFERS.
The problem was only postponed.


> If the number of SLRU buffers is too low,
> also in PostgreSQL8.1.4, if the number of buffers is increased
> I think that the same result is brought.
> (Although the buffer of CLOG or a multi-transaction also increases,
> I think that effect is small)  
> 
> Now, NUM_SLRU_BUFFERS is changed into 128 in PostgreSQL8.1.4
> and is under measurement.

Occurrence CSStorm when the version 8.1.4 passed similarly for 
1 hour and 10 minutes.


A strange point,
The number of times of a LWLock lock for LRU buffers is 0 times
until CSStorm occurs.
After CSStorm occurs, the share lock and the exclusion lock are required and 
most locks are kept waiting.
(exclusion lock for SubtransControlLock is increased rapidly after CSStorm 
start.)


Is different processing done by whether CSStrom has occurred or not occurred?



regards,

Katsuhiko Okano
okano katsuhiko _at_ oss ntt co jp

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster