Re: [HACKERS] LWLock statistics collector (was: CSStorm occurred again by postgreSQL8.2)
Tom Lane wrote: I'm confused ... is this patch being proposed for inclusion? I understood your previous message to say that it didn't help much. This is only the patch for carving where there is any problem. The patch is buggy as posted, because it will try to do this: if (shared-page_status[bestslot] == SLRU_PAGE_CLEAN) return bestslot; while bestslot could still be -1. A check is required. understood. (They will pick a different buffer, because the guy who got the buffer will have done SlruRecentlyUsed on it before releasing the control lock --- so I don't believe the worry that we get a buffer thrash scenario here. Look at the callers of SlruSelectLRUPage not just the function itself.) umm,I read a code again. otherwise to initiate I/O on the oldest buffer that isn't either clean or write-busy, if there is one; Understanding is a difficult point although it is important. Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] LWLock statistics collector (was: CSStorm occurred again by postgreSQL8.2)
Hi,All. Since the cause was found and the provisional patch was made and solved about the CSStorm problem in previous mails, it reports. Subject: [HACKERS] poor performance with Context Switch Storm at TPC-W. Date: Tue, 11 Jul 2006 20:09:24 +0900 From: Katsuhiko Okano [EMAIL PROTECTED] poor performance with Context Switch Storm occurred with the following composition. Premise knowledge : PostgreSQL8.0 to SAVEPOINT was supported. All the transactions have one or more subtransactions in an inside. When judging VISIBILITY of a tupple, XID which inserted the tupple needs to judge a top transaction or a subtransaction. (if it's XMIN committed) In order to judge, it is necessary to access SubTrans. (data structure which manages the parents of transaction ID) SubTrans is accessed via a LRU buffer. Occurrence conditions of this phenomenon : The occurrence conditions of this phenomenon are the following. - There is transaction which refers to the tupple in quantity frequency (typically seq scan). - (Appropriate frequency) There is updating transaction. - (Appropriate length) There is long live transaction. Point of view : (A) The algorithm which replaces a buffer is bad. A time stamp does not become new until swapout completes the swapout page. If access is during swap at other pages, the swapout page will be in the state where it is not used most, It is again chosen as the page for swapout. (When work load is high) (B) Accessing at every judgment of VISIBILITY of a tupple is frequent. If many processes wait LWLock using semop, CSStorm will occur. Result : As opposed to (A), I created a patch which the page of read/write IN PROGRESS does not make an exchange candidate. (It has betterslot supposing the case where all the pages are set to IN PROGRESS.) The patch was applied. However, it recurred. it did not become fundamental solution. As opposed to (B), A patch which is changed so that it may consider that all the transactions are top transactions was created. (Thank you, ITAGAKI) The patch was applied. 8 hours was measured. CSStorm problem was stopped. Argument : (1)Since neither SAVEPOINT nor the error trap using PL/pgSQL is done, the subtransaction is unnecessary. Is it better to implement the mode not using a subtransaction? (2)It is the better if a cache can be carried out by structure like CLOG that it seems that it is not necessary to check a LRU buffer at every occasion. Are there a problem and other ideas? Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] LWLock statistics collector (was: CSStorm occurred again by postgreSQL8.2)
Katsuhiko Okano wrote: Since the cause was found and the provisional patch was made and solved about the CSStorm problem in previous mails, it reports. (snip) (A) The algorithm which replaces a buffer is bad. A time stamp does not become new until swapout completes the swapout page. If access is during swap at other pages, the swapout page will be in the state where it is not used most, It is again chosen as the page for swapout. (When work load is high) The following is the patch. diff -cpr postgresql-8.1.4-orig/src/backend/access/transam/slru.c postgresql-8.1.4-SlruSelectLRUPage-fix/src/backend/access/transam/slru.c *** postgresql-8.1.4-orig/src/backend/access/transam/slru.c 2006-01-21 13:38:27.0 +0900 --- postgresql-8.1.4-SlruSelectLRUPage-fix/src/backend/access/transam/slru.c 2006-07-25 18:02:49.0 +0900 *** SlruSelectLRUPage(SlruCtl ctl, int pagen *** 703,710 for (;;) { int slotno; ! int bestslot = 0; unsigned int bestcount = 0; /* See if page already has a buffer assigned */ for (slotno = 0; slotno NUM_SLRU_BUFFERS; slotno++) --- 703,712 for (;;) { int slotno; ! int bestslot = -1; ! int betterslot = -1; unsigned int bestcount = 0; + unsigned int bettercount = 0; /* See if page already has a buffer assigned */ for (slotno = 0; slotno NUM_SLRU_BUFFERS; slotno++) *** SlruSelectLRUPage(SlruCtl ctl, int pagen *** 720,732 */ for (slotno = 0; slotno NUM_SLRU_BUFFERS; slotno++) { ! if (shared-page_status[slotno] == SLRU_PAGE_EMPTY) ! return slotno; ! if (shared-page_lru_count[slotno] bestcount ! shared-page_number[slotno] != shared-latest_page_number) ! { ! bestslot = slotno; ! bestcount = shared-page_lru_count[slotno]; } } --- 722,746 */ for (slotno = 0; slotno NUM_SLRU_BUFFERS; slotno++) { ! switch (shared-page_status[slotno]) ! { ! case SLRU_PAGE_EMPTY: ! return slotno; ! case SLRU_PAGE_READ_IN_PROGRESS: ! case SLRU_PAGE_WRITE_IN_PROGRESS: ! if (shared-page_lru_count[slotno] bettercount ! shared-page_number[slotno] != shared-latest_page_number) ! { ! betterslot = slotno; ! bettercount = shared-page_lru_count[slotno]; ! } ! default:/* SLRU_PAGE_CLEAN,SLRU_PAGE_DIRTY */ ! if (shared-page_lru_count[slotno] bestcount ! shared-page_number[slotno] != shared-latest_page_number) ! { ! bestslot = slotno; ! bestcount = shared-page_lru_count[slotno]; ! } } } *** SlruSelectLRUPage(SlruCtl ctl, int pagen *** 736,741 --- 750,758 if (shared-page_status[bestslot] == SLRU_PAGE_CLEAN) return bestslot; + if (bestslot == -1) + bestslot = betterslot; + /* * We need to do I/O. Normal case is that we have to write it out, * but it's possible in the worst case to have selected a read-busy Regards, Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: CSStorm occurred again by postgreSQL8.2. (Re: [HACKERS] poor
If there is a work load tool of a free license, I would like to try. FYI: there is a free tpc-w implementation done by Jan available at: http://pgfoundry.org/projects/tpc-w-php/ FYI(2): There is one more (pseudo) TPC-W implementation by OSDL. http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/osdl_dbt-1/ Thank you for the information. I'll try it. Regards, Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.
Jim C. Nasby wrote: If you haven't changed checkpoint timeout, this drop-off every 4-6 minutes indicates that you need to make the bgwriter more aggressive. I'll say to a customer when proposing and explaining. Thank you for the information. Regards, Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: CSStorm occurred again by postgreSQL8.2. (Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.)
Tom Lane [EMAIL PROTECTED] wrote: Katsuhiko Okano [EMAIL PROTECTED] writes: It does not solve, even if it increases the number of NUM_SUBTRANS_BUFFERS. The problem was only postponed. Can you provide a reproducible test case for this? Seven machines are required in order to perform measurement. (DB*1,AP*2,CLient*4) Enough work load was not able to be given in two machines. (DB*1,{AP+CL}*1) It was not able to reappear to a multiplex run of pgbench or a simple SELECT query. TPC-W of a work load tool used this time is a full scratch. Regrettably it cannot open to the public. If there is a work load tool of a free license, I would like to try. I will show if there is information required for others. The patch which outputs the number of times of LWLock was used this time. The following is old example output. FYI. # SELECT * FROM pg_stat_lwlocks; kind | pg_stat_get_lwlock_name | sh_call | sh_wait | ex_call | ex_wait | sleep --+++---+---+---+--- 0 | BufMappingLock | 559375542 | 33542 |320092 | 24025 | 0 1 | BufFreelistLock| 0 | 0 |370709 | 47 | 0 2 | LockMgrLock| 0 | 0 | 41718885 | 734502 | 0 3 | OidGenLock | 33 | 0 | 0 | 0 | 0 4 | XidGenLock | 12572279 | 10095 | 11299469 | 20089 | 0 5 | ProcArrayLock |8371330 | 72052 | 16965667 | 603294 | 0 6 | SInvalLock | 38822428 | 435 | 25917 | 128 | 0 7 | FreeSpaceLock | 0 | 0 | 16787 | 4 | 0 8 | WALInsertLock | 0 | 0 | 1239911 | 885 | 0 9 | WALWriteLock | 0 | 0 | 69907 | 5589 | 0 10 | ControlFileLock| 0 | 0 | 16686 | 1 | 0 11 | CheckpointLock | 0 | 0 |34 | 0 | 0 12 | CheckpointStartLock| 69509 | 0 |34 | 1 | 0 13 | CLogControlLock| 0 | 0 |236763 | 183 | 0 14 | SubtransControlLock| 0 | 0 | 753773945 | 205273395 | 0 15 | MultiXactGenLock | 66 | 0 | 0 | 0 | 0 16 | MultiXactOffsetControlLock | 0 | 0 |35 | 0 | 0 17 | MultiXactMemberControlLock | 0 | 0 |34 | 0 | 0 18 | RelCacheInitLock | 0 | 0 | 0 | 0 | 0 19 | BgWriterCommLock | 0 | 0 | 61457 | 1 | 0 20 | TwoPhaseStateLock | 33 | 0 | 0 | 0 | 0 21 | TablespaceCreateLock | 0 | 0 | 0 | 0 | 0 22 | BufferIO | 0 | 0 |695627 | 16 | 0 23 | BufferContent | 3568231805 | 1897 | 1361394 | 829 | 0 24 | CLog | 0 | 0 | 0 | 0 | 0 25 | SubTrans | 138571621 | 143208883 | 8122181 | 8132646 | 0 26 | MultiXactOffset| 0 | 0 | 0 | 0 | 0 27 | MultiXactMember| 0 | 0 | 0 | 0 | 0 (28 rows) I am pleased if interested. regards, Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
CSStorm occurred again by postgreSQL8.2. (Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.)
Katsuhiko Okano wrote: By PostgreSQL8.2, NUM_SUBTRANS_BUFFERS was changed into 128 and recompile and measured again. NOT occurrence of CSStorm. The value of WIPS was about 400. measured again. not occurrence when measured for 30 minutes. but occurrence when measured for 3 hours, and 1 hour and 10 minutes passed. It does not solve, even if it increases the number of NUM_SUBTRANS_BUFFERS. The problem was only postponed. If the number of SLRU buffers is too low, also in PostgreSQL8.1.4, if the number of buffers is increased I think that the same result is brought. (Although the buffer of CLOG or a multi-transaction also increases, I think that effect is small) Now, NUM_SLRU_BUFFERS is changed into 128 in PostgreSQL8.1.4 and is under measurement. Occurrence CSStorm when the version 8.1.4 passed similarly for 1 hour and 10 minutes. A strange point, The number of times of a LWLock lock for LRU buffers is 0 times until CSStorm occurs. After CSStorm occurs, the share lock and the exclusion lock are required and most locks are kept waiting. (exclusion lock for SubtransControlLock is increased rapidly after CSStorm start.) Is different processing done by whether CSStrom has occurred or not occurred? regards, Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.
Hi. Alvaro Herrera wrote: Katsuhiko Okano wrote: I suspected conflict of BufMappingLock. but, collected results are seen, occurrence of CSStorm and the increase of BufMappingLock counts seem not to correspond. Instead, SubtransControlLock and SubTrans were increasing. I do not understand what in the cause of CSStorm. Please see this thread: http://archives.postgresql.org/pgsql-hackers/2005-11/msg01547.php (actually it's a single message AFAICT) This was applied on the 8.2dev code, so I'm surprised that 8.2dev behaves the same as 8.1. Does your problem have any relationship to what's described there? Probably it is related. There is no telling are a thing with the bad method of a lock and whether it is bad that the number of LRU buffers is simply small. I also wondered whether the problem may be that the number of SLRU buffers we use for subtrans is too low. But the number was increased from the default 8 to 32 in 8.2dev as well. Maybe you could try increasing that even further; say 128 and see if the problem is still there. (src/include/access/subtrans.h, NUM_SUBTRANS_BUFFERS). By PostgreSQL8.2, NUM_SUBTRANS_BUFFERS was changed into 128 and recompile and measured again. NOT occurrence of CSStorm. The value of WIPS was about 400. (but the value of WIPS fell about to 320 at intervals of 4 to 6 minutes.) If the number of SLRU buffers is too low, also in PostgreSQL8.1.4, if the number of buffers is increased I think that the same result is brought. (Although the buffer of CLOG or a multi-transaction also increases, I think that effect is small) Now, NUM_SLRU_BUFFERS is changed into 128 in PostgreSQL8.1.4 and is under measurement. regards, Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] poor performance with Context Switch Storm at TPC-W.
Hi,All. The problem has occurred in my customer. poor performance with Context Switch Storm occurred with the following composition. Usually, CS is about 5000, WIPS=360. when CSStorm occurrence, CS is about 10, WIPS=60 or less. (WIPS = number of web interactions per second) It is under investigation using the patch which collects a LWLock. I suspected conflict of BufMappingLock. but, collected results are seen, occurrence of CSStorm and the increase of BufMappingLock counts seem not to correspond. Instead, SubtransControlLock and SubTrans were increasing. I do not understand what in the cause of CSStorm. [DB server]*1 Intel Xeon 3.0GHz*4(2CPU * H/T ON) 4GB Memory Red Hat Enterprise Linux ES release 4(Nahant Update 3) Linux version 2.6.9-34.ELsmp PostgreSQL8.1.3 (The version 8.2(head-6/15) was also occurred) shared_buffers=131072 temp_buffers=1000 max_connections=300 [AP server]*2 200 connection pooling. TPC-W model workload [Clinet]*4 TPC-W model workload (1) The following discussion were read. http://archives.postgresql.org/pgsql-hackers/2006-05/msg01003.php From: Tom Lane tgl ( at ) sss ( dot ) pgh ( dot ) pa ( dot ) us To: josh ( at ) agliodbs ( dot ) com Subject: Re: Further reduction of bufmgr lock contention Date: Wed, 24 May 2006 15:25:26 -0400 If there is a patch for investigation or a technique, would someone show it to me? (2) It seems that much sequential scan has occurred at CSStorm. When reading a tuple, do the visible satisfy check. it seems to generate the subtransaction for every transaction. How much is a possibility that the LWLock to a subtransaction cause CSStorm? best regards. Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] poor performance with Context Switch Storm at TPC-W.
hello. Do you have bgwriter on and what's the parameters? I read a theory somewhere that bgwriter scan a large portion of memory and cause L1/L2 thrushing, so with HT on, the other backends sharing the physical processor with it also get thrashed ... So try to turn bgwriter off or turn HT off see what's the difference. bgwriter is ON. at postgresql.conf: # - Background writer - bgwriter_delay = 200 # 10-1 milliseconds between rounds bgwriter_lru_percent = 1.0# 0-100% of LRU buffers scanned/round bgwriter_lru_maxpages = 5 # 0-1000 buffers max written/round bgwriter_all_percent = 0.333 # 0-100% of all buffers scanned/round bgwriter_all_maxpages = 5 # 0-1000 buffers max written/round I tried turn H/T OFF, but CSStorm occurred. Usually, CS is about 5000. when CSStrom occurrence, CS is about 7. (CS is a value smaller than the case where H/T is ON. I think that it is because the performance of CPU fell.) Regards Katsuhiko Okano okano katsuhiko _at_ oss ntt co jp ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly