Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.

2006-07-20 Thread Katsuhiko Okano
Jim C. Nasby wrote:
 If you haven't changed checkpoint timeout, this drop-off every 4-6
 minutes indicates that you need to make the bgwriter more aggressive.
I'll say to a customer when proposing and explaining.
Thank you for the information.

Regards,

Katsuhiko Okano
okano katsuhiko _at_ oss ntt co jp

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: CSStorm occurred again by postgreSQL8.2. (Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.)

2006-07-19 Thread Katsuhiko Okano
Tom Lane [EMAIL PROTECTED] wrote:
 Katsuhiko Okano [EMAIL PROTECTED] writes:
  It does not solve, even if it increases the number of NUM_SUBTRANS_BUFFERS.
  The problem was only postponed.
 
 Can you provide a reproducible test case for this?

Seven machines are required in order to perform measurement.
(DB*1,AP*2,CLient*4)
Enough work load was not able to be given in two machines.
(DB*1,{AP+CL}*1)


It was not able to reappear to a multiplex run of pgbench 
or a simple SELECT query.
TPC-W of a work load tool used this time is a full scratch.
Regrettably it cannot open to the public.
If there is a work load tool of a free license, I would like to try.


I will show if there is information required for others.


The patch which outputs the number of times of LWLock was used this time.
The following is old example output. FYI.

# SELECT * FROM pg_stat_lwlocks;
 kind |  pg_stat_get_lwlock_name   |  sh_call   |  sh_wait  |  ex_call  |  
ex_wait  | sleep 

--+++---+---+---+---

0 | BufMappingLock |  559375542 | 33542 |320092 | 
24025 | 0

1 | BufFreelistLock|  0 | 0 |370709 |   
 47 | 0

2 | LockMgrLock|  0 | 0 |  41718885 |
734502 | 0

3 | OidGenLock | 33 | 0 | 0 |   
  0 | 0

4 | XidGenLock |   12572279 | 10095 |  11299469 | 
20089 | 0

5 | ProcArrayLock  |8371330 | 72052 |  16965667 |
603294 | 0

6 | SInvalLock |   38822428 |   435 | 25917 |   
128 | 0

7 | FreeSpaceLock  |  0 | 0 | 16787 |   
  4 | 0

8 | WALInsertLock  |  0 | 0 |   1239911 |   
885 | 0

9 | WALWriteLock   |  0 | 0 | 69907 |  
5589 | 0

   10 | ControlFileLock|  0 | 0 | 16686 |   
  1 | 0

   11 | CheckpointLock |  0 | 0 |34 |   
  0 | 0

   12 | CheckpointStartLock|  69509 | 0 |34 |   
  1 | 0

   13 | CLogControlLock|  0 | 0 |236763 |   
183 | 0

   14 | SubtransControlLock|  0 | 0 | 753773945 | 
205273395 | 0

   15 | MultiXactGenLock   | 66 | 0 | 0 |   
  0 | 0

   16 | MultiXactOffsetControlLock |  0 | 0 |35 |   
  0 | 0

   17 | MultiXactMemberControlLock |  0 | 0 |34 |   
  0 | 0

   18 | RelCacheInitLock   |  0 | 0 | 0 |   
  0 | 0

   19 | BgWriterCommLock   |  0 | 0 | 61457 |   
  1 | 0

   20 | TwoPhaseStateLock  | 33 | 0 | 0 |   
  0 | 0

   21 | TablespaceCreateLock   |  0 | 0 | 0 |   
  0 | 0

   22 | BufferIO   |  0 | 0 |695627 |   
 16 | 0

   23 | BufferContent  | 3568231805 |  1897 |   1361394 |   
829 | 0

   24 | CLog   |  0 | 0 | 0 |   
  0 | 0

   25 | SubTrans   |  138571621 | 143208883 |   8122181 |   
8132646 | 0

   26 | MultiXactOffset|  0 | 0 | 0 |   
  0 | 0

   27 | MultiXactMember|  0 | 0 | 0 |   
  0 | 0

(28 rows)


I am pleased if interested.



regards,

Katsuhiko Okano
okano katsuhiko _at_ oss ntt co jp

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.

2006-07-19 Thread Jim C. Nasby
On Fri, Jul 14, 2006 at 02:58:36PM +0900, Katsuhiko Okano wrote:
 NOT occurrence of CSStorm. The value of WIPS was about 400.
 (but the value of WIPS fell about to 320 at intervals of 4 to 6 minutes.)

If you haven't changed checkpoint timeout, this drop-off every 4-6
minutes indicates that you need to make the bgwriter more aggressive.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


CSStorm occurred again by postgreSQL8.2. (Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.)

2006-07-18 Thread Katsuhiko Okano
Katsuhiko Okano wrote:
 By PostgreSQL8.2, NUM_SUBTRANS_BUFFERS was changed into 128
 and recompile and measured again.
 NOT occurrence of CSStorm. The value of WIPS was about 400.

measured again.
not occurrence when measured for 30 minutes.
but occurrence when measured for 3 hours, and 1 hour and 10 minutes passed.
It does not solve, even if it increases the number of NUM_SUBTRANS_BUFFERS.
The problem was only postponed.


 If the number of SLRU buffers is too low,
 also in PostgreSQL8.1.4, if the number of buffers is increased
 I think that the same result is brought.
 (Although the buffer of CLOG or a multi-transaction also increases,
 I think that effect is small)  
 
 Now, NUM_SLRU_BUFFERS is changed into 128 in PostgreSQL8.1.4
 and is under measurement.

Occurrence CSStorm when the version 8.1.4 passed similarly for 
1 hour and 10 minutes.


A strange point,
The number of times of a LWLock lock for LRU buffers is 0 times
until CSStorm occurs.
After CSStorm occurs, the share lock and the exclusion lock are required and 
most locks are kept waiting.
(exclusion lock for SubtransControlLock is increased rapidly after CSStorm 
start.)


Is different processing done by whether CSStrom has occurred or not occurred?



regards,

Katsuhiko Okano
okano katsuhiko _at_ oss ntt co jp

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: CSStorm occurred again by postgreSQL8.2. (Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.)

2006-07-18 Thread Tom Lane
Katsuhiko Okano [EMAIL PROTECTED] writes:
 It does not solve, even if it increases the number of NUM_SUBTRANS_BUFFERS.
 The problem was only postponed.

Can you provide a reproducible test case for this?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.

2006-07-13 Thread Katsuhiko Okano
Hi.


Alvaro Herrera wrote:

 Katsuhiko Okano wrote:
 
  I suspected conflict of BufMappingLock.
  but, collected results are seen,
  occurrence of CSStorm and the increase of BufMappingLock counts
  seem not to correspond.
  Instead, SubtransControlLock and SubTrans were increasing.
  I do not understand what in the cause of CSStorm.
 
 Please see this thread:
 http://archives.postgresql.org/pgsql-hackers/2005-11/msg01547.php
 (actually it's a single message AFAICT)
 
 This was applied on the 8.2dev code, so I'm surprised that 8.2dev
 behaves the same as 8.1.
 
 Does your problem have any relationship to what's described there?
 
Probably it is related.
There is no telling are a thing with the bad method of a lock and 
whether it is bad that the number of LRU buffers is simply small.


 I also wondered whether the problem may be that the number of SLRU
 buffers we use for subtrans is too low.  But the number was increased
 from the default 8 to 32 in 8.2dev as well.  Maybe you could try
 increasing that even further; say 128 and see if the problem is still
 there.  (src/include/access/subtrans.h, NUM_SUBTRANS_BUFFERS).
By PostgreSQL8.2, NUM_SUBTRANS_BUFFERS was changed into 128
and recompile and measured again.
NOT occurrence of CSStorm. The value of WIPS was about 400.
(but the value of WIPS fell about to 320 at intervals of 4 to 6 minutes.)

If the number of SLRU buffers is too low,
also in PostgreSQL8.1.4, if the number of buffers is increased
I think that the same result is brought.
(Although the buffer of CLOG or a multi-transaction also increases,
I think that effect is small)  

Now, NUM_SLRU_BUFFERS is changed into 128 in PostgreSQL8.1.4
and is under measurement.


regards,

Katsuhiko Okano
okano katsuhiko _at_ oss ntt co jp

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] poor performance with Context Switch Storm at TPC-W.

2006-07-11 Thread Katsuhiko Okano
Hi,All.

The problem has occurred in my customer.
poor performance with Context Switch Storm occurred
with the following composition.
Usually, CS is about 5000, WIPS=360.
when CSStorm occurrence, CS is about 10, WIPS=60 or less.
(WIPS = number of web interactions per second)

It is under investigation using the patch
which collects a LWLock.

I suspected conflict of BufMappingLock.
but, collected results are seen,
occurrence of CSStorm and the increase of BufMappingLock counts
seem not to correspond.
Instead, SubtransControlLock and SubTrans were increasing.
I do not understand what in the cause of CSStorm.



[DB server]*1
Intel Xeon 3.0GHz*4(2CPU * H/T ON)
4GB Memory
Red Hat Enterprise Linux ES release 4(Nahant Update 3)
Linux version 2.6.9-34.ELsmp
PostgreSQL8.1.3 (The version 8.2(head-6/15) was also occurred)
shared_buffers=131072
temp_buffers=1000 
max_connections=300

[AP server]*2
200 connection pooling.
TPC-W model workload

[Clinet]*4
TPC-W model workload



(1)
The following discussion were read.
http://archives.postgresql.org/pgsql-hackers/2006-05/msg01003.php
From: Tom Lane tgl ( at ) sss ( dot ) pgh ( dot ) pa ( dot ) us 
To: josh ( at ) agliodbs ( dot ) com 
Subject: Re: Further reduction of bufmgr lock contention 
Date: Wed, 24 May 2006 15:25:26 -0400 

If there is a patch for investigation or a technique,
would someone show it to me?


(2)
It seems that much sequential scan has occurred at CSStorm.
When reading a tuple, do the visible satisfy check.
it seems to generate the subtransaction for every transaction.
How much is a possibility that 
the LWLock to a subtransaction cause CSStorm?


best regards.

Katsuhiko Okano
okano katsuhiko _at_ oss ntt co jp

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.

2006-07-11 Thread Qingqing Zhou

Katsuhiko Okano [EMAIL PROTECTED] wrote

 The problem has occurred in my customer.
 poor performance with Context Switch Storm occurred
 with the following composition.
 Usually, CS is about 5000, WIPS=360.
 when CSStorm occurrence, CS is about 10, WIPS=60 or less.

 Intel Xeon 3.0GHz*4(2CPU * H/T ON)
 4GB Memory

Do you have bgwriter on and what's the parameters? I read a theory somewhere
that bgwriter scan a large portion of memory and cause L1/L2 thrushing, so
with HT on, the other backends sharing the physical processor with it also
get thrashed ... So try to turn bgwriter off or turn HT off see what's the
difference.

Regards,
Qingqing



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.

2006-07-11 Thread Alvaro Herrera
Katsuhiko Okano wrote:

 I suspected conflict of BufMappingLock.
 but, collected results are seen,
 occurrence of CSStorm and the increase of BufMappingLock counts
 seem not to correspond.
 Instead, SubtransControlLock and SubTrans were increasing.
 I do not understand what in the cause of CSStorm.

Please see this thread:
http://archives.postgresql.org/pgsql-hackers/2005-11/msg01547.php
(actually it's a single message AFAICT)

This was applied on the 8.2dev code, so I'm surprised that 8.2dev
behaves the same as 8.1.

Does your problem have any relationship to what's described there?

I also wondered whether the problem may be that the number of SLRU
buffers we use for subtrans is too low.  But the number was increased
from the default 8 to 32 in 8.2dev as well.  Maybe you could try
increasing that even further; say 128 and see if the problem is still
there.  (src/include/access/subtrans.h, NUM_SUBTRANS_BUFFERS).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.

2006-07-11 Thread Josh Berkus

Katsuhiko,

Have you tried turning HT off?   HT is not generally considered (even by 
Intel) a good idea for database appplications.


--Josh

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.

2006-07-11 Thread Katsuhiko Okano
hello.


 Do you have bgwriter on and what's the parameters? I read a theory somewhere
 that bgwriter scan a large portion of memory and cause L1/L2 thrushing, so
 with HT on, the other backends sharing the physical processor with it also
 get thrashed ... So try to turn bgwriter off or turn HT off see what's the
 difference.

bgwriter is ON.
at postgresql.conf:
 # - Background writer -
 
 bgwriter_delay = 200  # 10-1 milliseconds between rounds
 bgwriter_lru_percent = 1.0# 0-100% of LRU buffers scanned/round
 bgwriter_lru_maxpages = 5 # 0-1000 buffers max written/round
 bgwriter_all_percent = 0.333  # 0-100% of all buffers scanned/round
 bgwriter_all_maxpages = 5 # 0-1000 buffers max written/round


I tried turn H/T OFF, but CSStorm occurred.
Usually, CS is about 5000.
when CSStrom occurrence, CS is about 7.
(CS is a value smaller than the case where H/T is ON.
I think that it is because the performance of CPU fell.)


Regards

Katsuhiko Okano
okano katsuhiko _at_ oss ntt co jp

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly