Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.
"Jim C. Nasby" wrote: > If you haven't changed checkpoint timeout, this drop-off every 4-6 > minutes indicates that you need to make the bgwriter more aggressive. I'll say to a customer when proposing and explaining. Thank you for the information. Regards, Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.
On Fri, Jul 14, 2006 at 02:58:36PM +0900, Katsuhiko Okano wrote: > NOT occurrence of CSStorm. The value of WIPS was about 400. > (but the value of WIPS fell about to 320 at intervals of 4 to 6 minutes.) If you haven't changed checkpoint timeout, this drop-off every 4-6 minutes indicates that you need to make the bgwriter more aggressive. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: CSStorm occurred again by postgreSQL8.2. (Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.)
"Tom Lane <[EMAIL PROTECTED]>" wrote: > Katsuhiko Okano <[EMAIL PROTECTED]> writes: > > It does not solve, even if it increases the number of NUM_SUBTRANS_BUFFERS. > > The problem was only postponed. > > Can you provide a reproducible test case for this? Seven machines are required in order to perform measurement. (DB*1,AP*2,CLient*4) Enough work load was not able to be given in two machines. (DB*1,{AP+CL}*1) It was not able to reappear to a multiplex run of pgbench or a simple SELECT query. TPC-W of a work load tool used this time is a full scratch. Regrettably it cannot open to the public. If there is a work load tool of a free license, I would like to try. I will show if there is information required for others. The patch which outputs the number of times of LWLock was used this time. The following is old example output. FYI. # SELECT * FROM pg_stat_lwlocks; kind | pg_stat_get_lwlock_name | sh_call | sh_wait | ex_call | ex_wait | sleep --+++---+---+---+--- 0 | BufMappingLock | 559375542 | 33542 |320092 | 24025 | 0 1 | BufFreelistLock| 0 | 0 |370709 | 47 | 0 2 | LockMgrLock| 0 | 0 | 41718885 | 734502 | 0 3 | OidGenLock | 33 | 0 | 0 | 0 | 0 4 | XidGenLock | 12572279 | 10095 | 11299469 | 20089 | 0 5 | ProcArrayLock |8371330 | 72052 | 16965667 | 603294 | 0 6 | SInvalLock | 38822428 | 435 | 25917 | 128 | 0 7 | FreeSpaceLock | 0 | 0 | 16787 | 4 | 0 8 | WALInsertLock | 0 | 0 | 1239911 | 885 | 0 9 | WALWriteLock | 0 | 0 | 69907 | 5589 | 0 10 | ControlFileLock| 0 | 0 | 16686 | 1 | 0 11 | CheckpointLock | 0 | 0 |34 | 0 | 0 12 | CheckpointStartLock| 69509 | 0 |34 | 1 | 0 13 | CLogControlLock| 0 | 0 |236763 | 183 | 0 14 | SubtransControlLock| 0 | 0 | 753773945 | 205273395 | 0 15 | MultiXactGenLock | 66 | 0 | 0 | 0 | 0 16 | MultiXactOffsetControlLock | 0 | 0 |35 | 0 | 0 17 | MultiXactMemberControlLock | 0 | 0 |34 | 0 | 0 18 | RelCacheInitLock | 0 | 0 | 0 | 0 | 0 19 | BgWriterCommLock | 0 | 0 | 61457 | 1 | 0 20 | TwoPhaseStateLock | 33 | 0 | 0 | 0 | 0 21 | TablespaceCreateLock | 0 | 0 | 0 | 0 | 0 22 | BufferIO | 0 | 0 |695627 | 16 | 0 23 | BufferContent | 3568231805 | 1897 | 1361394 | 829 | 0 24 | CLog | 0 | 0 | 0 | 0 | 0 25 | SubTrans | 138571621 | 143208883 | 8122181 | 8132646 | 0 26 | MultiXactOffset| 0 | 0 | 0 | 0 | 0 27 | MultiXactMember| 0 | 0 | 0 | 0 | 0 (28 rows) I am pleased if interested. regards, Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: CSStorm occurred again by postgreSQL8.2. (Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.)
Katsuhiko Okano <[EMAIL PROTECTED]> writes: > It does not solve, even if it increases the number of NUM_SUBTRANS_BUFFERS. > The problem was only postponed. Can you provide a reproducible test case for this? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
CSStorm occurred again by postgreSQL8.2. (Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.)
Katsuhiko Okano wrote: > By PostgreSQL8.2, NUM_SUBTRANS_BUFFERS was changed into 128 > and recompile and measured again. > NOT occurrence of CSStorm. The value of WIPS was about 400. measured again. not occurrence when measured for 30 minutes. but occurrence when measured for 3 hours, and 1 hour and 10 minutes passed. It does not solve, even if it increases the number of NUM_SUBTRANS_BUFFERS. The problem was only postponed. > If the number of SLRU buffers is too low, > also in PostgreSQL8.1.4, if the number of buffers is increased > I think that the same result is brought. > (Although the buffer of CLOG or a multi-transaction also increases, > I think that effect is small) > > Now, NUM_SLRU_BUFFERS is changed into 128 in PostgreSQL8.1.4 > and is under measurement. Occurrence CSStorm when the version 8.1.4 passed similarly for 1 hour and 10 minutes. A strange point, The number of times of a LWLock lock for LRU buffers is 0 times until CSStorm occurs. After CSStorm occurs, the share lock and the exclusion lock are required and most locks are kept waiting. (exclusion lock for SubtransControlLock is increased rapidly after CSStorm start.) Is different processing done by whether CSStrom has occurred or not occurred? regards, Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.
Hi. Alvaro Herrera wrote: > Katsuhiko Okano wrote: > > > I suspected conflict of BufMappingLock. > > but, collected results are seen, > > occurrence of CSStorm and the increase of BufMappingLock counts > > seem not to correspond. > > Instead, SubtransControlLock and SubTrans were increasing. > > I do not understand what in the cause of CSStorm. > > Please see this thread: > http://archives.postgresql.org/pgsql-hackers/2005-11/msg01547.php > (actually it's a single message AFAICT) > > This was applied on the 8.2dev code, so I'm surprised that 8.2dev > behaves the same as 8.1. > > Does your problem have any relationship to what's described there? > Probably it is related. There is no telling are a thing with the bad method of a lock and whether it is bad that the number of LRU buffers is simply small. > I also wondered whether the problem may be that the number of SLRU > buffers we use for subtrans is too low. But the number was increased > from the default 8 to 32 in 8.2dev as well. Maybe you could try > increasing that even further; say 128 and see if the problem is still > there. (src/include/access/subtrans.h, NUM_SUBTRANS_BUFFERS). By PostgreSQL8.2, NUM_SUBTRANS_BUFFERS was changed into 128 and recompile and measured again. NOT occurrence of CSStorm. The value of WIPS was about 400. (but the value of WIPS fell about to 320 at intervals of 4 to 6 minutes.) If the number of SLRU buffers is too low, also in PostgreSQL8.1.4, if the number of buffers is increased I think that the same result is brought. (Although the buffer of CLOG or a multi-transaction also increases, I think that effect is small) Now, NUM_SLRU_BUFFERS is changed into 128 in PostgreSQL8.1.4 and is under measurement. regards, Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.
hello. > Do you have bgwriter on and what's the parameters? I read a theory somewhere > that bgwriter scan a large portion of memory and cause L1/L2 thrushing, so > with HT on, the other backends sharing the physical processor with it also > get thrashed ... So try to turn bgwriter off or turn HT off see what's the > difference. bgwriter is ON. at postgresql.conf: > # - Background writer - > > bgwriter_delay = 200 # 10-1 milliseconds between rounds > bgwriter_lru_percent = 1.0# 0-100% of LRU buffers scanned/round > bgwriter_lru_maxpages = 5 # 0-1000 buffers max written/round > bgwriter_all_percent = 0.333 # 0-100% of all buffers scanned/round > bgwriter_all_maxpages = 5 # 0-1000 buffers max written/round I tried turn H/T OFF, but CSStorm occurred. Usually, CS is about 5000. when CSStrom occurrence, CS is about 7. (CS is a value smaller than the case where H/T is ON. I think that it is because the performance of CPU fell.) Regards Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.
Katsuhiko, Have you tried turning HT off? HT is not generally considered (even by Intel) a good idea for database appplications. --Josh ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.
Katsuhiko Okano wrote: > I suspected conflict of BufMappingLock. > but, collected results are seen, > occurrence of CSStorm and the increase of BufMappingLock counts > seem not to correspond. > Instead, SubtransControlLock and SubTrans were increasing. > I do not understand what in the cause of CSStorm. Please see this thread: http://archives.postgresql.org/pgsql-hackers/2005-11/msg01547.php (actually it's a single message AFAICT) This was applied on the 8.2dev code, so I'm surprised that 8.2dev behaves the same as 8.1. Does your problem have any relationship to what's described there? I also wondered whether the problem may be that the number of SLRU buffers we use for subtrans is too low. But the number was increased from the default 8 to 32 in 8.2dev as well. Maybe you could try increasing that even further; say 128 and see if the problem is still there. (src/include/access/subtrans.h, NUM_SUBTRANS_BUFFERS). -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.
"Katsuhiko Okano" <[EMAIL PROTECTED]> wrote > > The problem has occurred in my customer. > poor performance with Context Switch Storm occurred > with the following composition. > Usually, CS is about 5000, WIPS=360. > when CSStorm occurrence, CS is about 10, WIPS=60 or less. > > Intel Xeon 3.0GHz*4(2CPU * H/T ON) > 4GB Memory Do you have bgwriter on and what's the parameters? I read a theory somewhere that bgwriter scan a large portion of memory and cause L1/L2 thrushing, so with HT on, the other backends sharing the physical processor with it also get thrashed ... So try to turn bgwriter off or turn HT off see what's the difference. Regards, Qingqing ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] poor performance with Context Switch Storm at TPC-W.
Hi,All. The problem has occurred in my customer. poor performance with Context Switch Storm occurred with the following composition. Usually, CS is about 5000, WIPS=360. when CSStorm occurrence, CS is about 10, WIPS=60 or less. (WIPS = number of web interactions per second) It is under investigation using the patch which collects a LWLock. I suspected conflict of BufMappingLock. but, collected results are seen, occurrence of CSStorm and the increase of BufMappingLock counts seem not to correspond. Instead, SubtransControlLock and SubTrans were increasing. I do not understand what in the cause of CSStorm. [DB server]*1 Intel Xeon 3.0GHz*4(2CPU * H/T ON) 4GB Memory Red Hat Enterprise Linux ES release 4(Nahant Update 3) Linux version 2.6.9-34.ELsmp PostgreSQL8.1.3 (The version 8.2(head-6/15) was also occurred) shared_buffers=131072 temp_buffers=1000 max_connections=300 [AP server]*2 200 connection pooling. TPC-W model workload [Clinet]*4 TPC-W model workload (1) The following discussion were read. http://archives.postgresql.org/pgsql-hackers/2006-05/msg01003.php From: Tom Lane To: josh ( at ) agliodbs ( dot ) com Subject: Re: Further reduction of bufmgr lock contention Date: Wed, 24 May 2006 15:25:26 -0400 If there is a patch for investigation or a technique, would someone show it to me? (2) It seems that much sequential scan has occurred at CSStorm. When reading a tuple, do the visible satisfy check. it seems to generate the subtransaction for every transaction. How much is a possibility that the LWLock to a subtransaction cause CSStorm? best regards. Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings