Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Robert Lor [EMAIL PROTECTED] writes: Here is the break down between exclusive shared LWLocks. Do the numbers look reasonable to you? Yeah, those seem plausible, although the hold time for CheckpointStartLock seems awfully high --- about 20 msec per transaction. Are you using a nonzero commit_delay? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tatsuo Ishii [EMAIL PROTECTED] writes: Interesting. We (some Japanese companies including SRA OSS, Inc. Japan) did some PG scalability testing using a Unisys's big 16 (physical) CPU machine and found PG scales up to 8 CPUs. However beyond 8 CPU PG does not scale anymore. The result can be viewed at OSS iPedia web site (http://ossipedia.ipa.go.jp). Our conclusion was PG has a serious lock contention problem in the environment by analyzing the oprofile result. Can you retry this test case using CVS tip? I'm curious to see if having partitioned the BufMappingLock helps ... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom Lane wrote: Yeah, those seem plausible, although the hold time for CheckpointStartLock seems awfully high --- about 20 msec per transaction. Are you using a nonzero commit_delay? I didn't change commit_delay which defaults to zero. Regards, -Robert ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Robert Lor [EMAIL PROTECTED] writes: Tom Lane wrote: Yeah, those seem plausible, although the hold time for CheckpointStartLock seems awfully high --- about 20 msec per transaction. Are you using a nonzero commit_delay? I didn't change commit_delay which defaults to zero. Hmmm ... AFAICS this must mean that flushing the WAL data to disk at transaction commit time takes (most of) 20 msec on your hardware. Which still seems high --- on most modern disks that'd be at least two disk revolutions, maybe more. What's the disk hardware you're testing on, particularly its RPM spec? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom Lane wrote: Hmmm ... AFAICS this must mean that flushing the WAL data to disk at transaction commit time takes (most of) 20 msec on your hardware. Which still seems high --- on most modern disks that'd be at least two disk revolutions, maybe more. What's the disk hardware you're testing on, particularly its RPM spec? I actually ran the test on my laptop. It has an Ultra ATA/100 drive (5400 rpm). The test was just a quickie to show some data from the probes. I'll collect and share data from the T2000 server later. Regards, -Robert ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom Lane wrote: Tatsuo Ishii [EMAIL PROTECTED] writes: 18% in s_lock is definitely bad :-(. Were you able to determine which LWLock(s) are accounting for the contention? Sorry for the delay. Finally I got the oprofile data. It's huge(34MB). If you are interested, I can put somewhere. Please let me know. I finally got a chance to look at this, and it seems clear that all the traffic is on the BufMappingLock. This is essentially the same problem we were discussing with respect to Gavin Hamill's report of poor performance on an 8-way IBM PPC64 box (see hackers archives around 2006-04-21). If your database is fully cached in shared buffers, then you can do a whole lot of buffer accesses per unit time, and even though all the BufMappingLock acquisitions are in shared-LWLock mode, the LWLock's spinlock ends up being heavily contended on an SMP box. It's likely that CVS HEAD would show somewhat better performance because of the btree change to cache local copies of index metapages (which eliminates a fair fraction of buffer accesses, at least in Gavin's test case). Getting much further than that seems to require partitioning the buffer mapping table. The last discussion stalled on my concerns about unpredictable shared memory usage, but I have some ideas on that which I'll post separately. In the meantime, thanks for sending along the oprofile data! regards, tom lane I ran pgbench and fired up a DTrace script using the lwlock probes we've added, and it looks like BufMappingLock is the most contended lock, but CheckpointStartLocks are held for longer duration! Lock IdMode Count ControlFileLock Exclusive 1 SubtransControlLock Exclusive 1 BgWriterCommLock Exclusive 6 FreeSpaceLock Exclusive 6 FirstLockMgrLock Exclusive 48 BufFreelistLock Exclusive 74 BufMappingLock Exclusive 74 CLogControlLock Exclusive 184 XidGenLock Exclusive 184 CheckpointStartLock Shared 185 WALWriteLock Exclusive 185 ProcArrayLock Exclusive 368 CLogControlLock Shared 552 SubtransControlLock Shared1273 WALInsertLock Exclusive1476 XidGenLock Shared1842 ProcArrayLock Shared3160 SInvalLock Shared3684 BufMappingLock Shared 14578 Lock Id Combined Time (ns) ControlFileLock 7915 BgWriterCommLock43438 FreeSpaceLock 39 BufFreelistLock 448530 FirstLockMgrLock 2879957 CLogControlLock 4237750 SubtransControlLock 6378042 XidGenLock 9500422 WALInsertLock 16372040 SInvalLock 23284554 ProcArrayLock 32188638 BufMappingLock113128512 WALWriteLock142391501 CheckpointStartLock 4171106665 Regards, -Robert ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Robert Lor [EMAIL PROTECTED] writes: I ran pgbench and fired up a DTrace script using the lwlock probes we've added, and it looks like BufMappingLock is the most contended lock, but CheckpointStartLocks are held for longer duration! Those numbers look a bit suspicious --- I'd expect to see some of the LWLocks being taken in both shared and exclusive modes, but you don't show any such cases. You sure your script is counting correctly? Also, it'd be interesting to count time spent holding shared lock separately from time spent holding exclusive. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom Lane wrote: Those numbers look a bit suspicious --- I'd expect to see some of the LWLocks being taken in both shared and exclusive modes, but you don't show any such cases. You sure your script is counting correctly? I'll double check to make sure no stupid mistakes were made! Also, it'd be interesting to count time spent holding shared lock separately from time spent holding exclusive. Will provide that data later today. Regards, -Robert ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom Lane wrote: Also, it'd be interesting to count time spent holding shared lock separately from time spent holding exclusive. Tom, Here is the break down between exclusive shared LWLocks. Do the numbers look reasonable to you? Regards, -Robert bash-3.00# time ./Tom_lwlock_acquire.d `pgrep -n postgres` ** LWLock Count: Exclusive ** Lock IdMode Count ControlFileLock Exclusive 1 FreeSpaceLock Exclusive 9 XidGenLock Exclusive 202 CLogControlLock Exclusive 203 WALWriteLock Exclusive 203 BgWriterCommLock Exclusive 222 BufFreelistLock Exclusive 305 BufMappingLock Exclusive 305 ProcArrayLock Exclusive 405 FirstLockMgrLock Exclusive 670 WALInsertLock Exclusive1616 ** LWLock Count: Shared ** Lock IdMode Count CheckpointStartLock Shared 202 CLogControlLock Shared 450 SubtransControlLock Shared 776 XidGenLock Shared2020 ProcArrayLock Shared3778 SInvalLock Shared4040 BufMappingLock Shared 40838 ** LWLock Time: Exclusive ** Lock Id Combined Time (ns) ControlFileLock 8301 FreeSpaceLock80590 CLogControlLock 1603557 BgWriterCommLock 1607122 BufFreelistLock 1997406 XidGenLock 2312442 BufMappingLock 3161683 FirstLockMgrLock 5392575 ProcArrayLock 6034396 WALInsertLock 12277693 WALWriteLock324869744 ** LWLock Time: Shared ** Lock Id Combined Time (ns) CLogControlLock 3183788 SubtransControlLock 6956229 XidGenLock 12012576 SInvalLock 35567976 ProcArrayLock 45400779 BufMappingLock300669441 CheckpointStartLock 4056134243 real0m24.718s user0m0.382s sys 0m0.181s ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tatsuo Ishii [EMAIL PROTECTED] writes: 18% in s_lock is definitely bad :-(. Were you able to determine which LWLock(s) are accounting for the contention? Sorry for the delay. Finally I got the oprofile data. It's huge(34MB). If you are interested, I can put somewhere. Please let me know. I finally got a chance to look at this, and it seems clear that all the traffic is on the BufMappingLock. This is essentially the same problem we were discussing with respect to Gavin Hamill's report of poor performance on an 8-way IBM PPC64 box (see hackers archives around 2006-04-21). If your database is fully cached in shared buffers, then you can do a whole lot of buffer accesses per unit time, and even though all the BufMappingLock acquisitions are in shared-LWLock mode, the LWLock's spinlock ends up being heavily contended on an SMP box. It's likely that CVS HEAD would show somewhat better performance because of the btree change to cache local copies of index metapages (which eliminates a fair fraction of buffer accesses, at least in Gavin's test case). Getting much further than that seems to require partitioning the buffer mapping table. The last discussion stalled on my concerns about unpredictable shared memory usage, but I have some ideas on that which I'll post separately. In the meantime, thanks for sending along the oprofile data! regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tatsuo Ishii [EMAIL PROTECTED] writes: Interesting. We (some Japanese companies including SRA OSS, Inc. Japan) did some PG scalability testing using a Unisys's big 16 (physical) CPU machine and found PG scales up to 8 CPUs. However beyond 8 CPU PG does not scale anymore. The result can be viewed at OSS iPedia web site (http://ossipedia.ipa.go.jp). Our conclusion was PG has a serious lock contention problem in the environment by analyzing the oprofile result. 18% in s_lock is definitely bad :-(. Were you able to determine which LWLock(s) are accounting for the contention? Yes. We were interested in that too. Some people did addtional tests to determin that. I don't have the report handy now. I will report back next week. Sorry for the delay. Finally I got the oprofile data. It's huge(34MB). If you are interested, I can put somewhere. Please let me know. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tatsuo Ishii [EMAIL PROTECTED] writes: 18% in s_lock is definitely bad :-(. Were you able to determine which LWLock(s) are accounting for the contention? Yes. We were interested in that too. Some people did addtional tests to determin that. I don't have the report handy now. I will report back next week. Sorry for the delay. Finally I got the oprofile data. It's huge(34MB). If you are interested, I can put somewhere. Please let me know. Yes, please. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community
On Jun 16, 2006, at 12:01 PM, Josh Berkus wrote: Folks, I am thrill to inform you all that Sun has just donated a fully loaded T2000 system to the PostgreSQL community, and it's being setup by Corey Shields at OSL (osuosl.org) and should be online probably early next week. The system has So this system will be hosted by Open Source Lab in Oregon. It's going to be donated to Software In the Public Interest, who will own for the PostgreSQL fund. We'll want to figure out a scheduling system to schedule performance and compatibility testing on this machine; I'm not sure exactly how that will work. Suggestions welcome. As a warning, Gavin Sherry and I have a bunch of pending tests already to run. First thing as soon as I have a login, of course, is to set up a Buildfarm instance. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community
Folks, I am thrill to inform you all that Sun has just donated a fully loaded T2000 system to the PostgreSQL community, and it's being setup by Corey Shields at OSL (osuosl.org) and should be online probably early next week. The system has So this system will be hosted by Open Source Lab in Oregon. It's going to be donated to Software In the Public Interest, who will own for the PostgreSQL fund. We'll want to figure out a scheduling system to schedule performance and compatibility testing on this machine; I'm not sure exactly how that will work. Suggestions welcome. As a warning, Gavin Sherry and I have a bunch of pending tests already to run. First thing as soon as I have a login, of course, is to set up a Buildfarm instance. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community
On 16-6-2006 17:18, Robert Lor wrote: I think this system is well suited for PG scalability testing, among others. We did an informal test using an internal OLTP benchmark and noticed that PG can scale to around 8 CPUs. Would be really cool if all 32 virtual CPUs can be utilized!!! I can already confirm very good scalability (with our workload) on postgresql on that machine. We've been testing a 32thread/16G-version and it shows near-linear scaling when enabling 1, 2, 4, 6 and 8 cores (with all four threads enabled). The threads are a bit less scalable, but still pretty good. Enabling 1, 2 or 4 threads for each core yields resp 60 and 130% extra performance. Best regards, Arjen ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
I am thrill to inform you all that Sun has just donated a fully loaded T2000 system to the PostgreSQL community, and it's being setup by Corey Shields at OSL (osuosl.org) and should be online probably early next week. The system has * 8 cores, 4 hw threads/core @ 1.2 GHz. Solaris sees the system as having 32 virtual CPUs, and each can be enabled or disabled individually * 32 GB of DDR2 SDRAM memory * 2 @ 73GB internal SAS drives (1 RPM) * 4 Gigabit ethernet ports For complete spec, visit http://www.sun.com/servers/coolthreads/t2000/specifications.jsp I think this system is well suited for PG scalability testing, among others. We did an informal test using an internal OLTP benchmark and noticed that PG can scale to around 8 CPUs. Would be really cool if all 32 virtual CPUs can be utilized!!! Interesting. We (some Japanese companies including SRA OSS, Inc. Japan) did some PG scalability testing using a Unisys's big 16 (physical) CPU machine and found PG scales up to 8 CPUs. However beyond 8 CPU PG does not scale anymore. The result can be viewed at OSS iPedia web site (http://ossipedia.ipa.go.jp). Our conclusion was PG has a serious lock contention problem in the environment by analyzing the oprofile result. You can take a look at the detailed report at: http://ossipedia.ipa.go.jp/capacity/EV0604210111/ (unfortunately only Japanese contents is available at the moment. Please use some automatic translation services) Evalution environment was: PostgreSQL 8.1.2 OSDL DBT-1 2.1 Miracle Linux 4.0 Unisys ES700 Xeon 2.8GHz CPU x 16 Mem 16GB(HT off) -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tatsuo Ishii [EMAIL PROTECTED] writes: Interesting. We (some Japanese companies including SRA OSS, Inc. Japan) did some PG scalability testing using a Unisys's big 16 (physical) CPU machine and found PG scales up to 8 CPUs. However beyond 8 CPU PG does not scale anymore. The result can be viewed at OSS iPedia web site (http://ossipedia.ipa.go.jp). Our conclusion was PG has a serious lock contention problem in the environment by analyzing the oprofile result. 18% in s_lock is definitely bad :-(. Were you able to determine which LWLock(s) are accounting for the contention? The test case seems to be spending a remarkable amount of time in LIKE comparisons, too. That probably is not a representative condition. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tatsuo Ishii [EMAIL PROTECTED] writes: Interesting. We (some Japanese companies including SRA OSS, Inc. Japan) did some PG scalability testing using a Unisys's big 16 (physical) CPU machine and found PG scales up to 8 CPUs. However beyond 8 CPU PG does not scale anymore. The result can be viewed at OSS iPedia web site (http://ossipedia.ipa.go.jp). Our conclusion was PG has a serious lock contention problem in the environment by analyzing the oprofile result. 18% in s_lock is definitely bad :-(. Were you able to determine which LWLock(s) are accounting for the contention? Yes. We were interested in that too. Some people did addtional tests to determin that. I don't have the report handy now. I will report back next week. The test case seems to be spending a remarkable amount of time in LIKE comparisons, too. That probably is not a representative condition. I know. I think point is 18% in s_lock only appears with 12 CPUs or more. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community
Arjen van der Meijden wrote: I can already confirm very good scalability (with our workload) on postgresql on that machine. We've been testing a 32thread/16G-version and it shows near-linear scaling when enabling 1, 2, 4, 6 and 8 cores (with all four threads enabled). The threads are a bit less scalable, but still pretty good. Enabling 1, 2 or 4 threads for each core yields resp 60 and 130% extra performance. Wow, what type of workload is it? And did you do much tuning to get near-linear scalability to 32 threads? Regards, -Robert ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster