Re: [PERFORM] I/O on select count(*)
Tom Lane wrote: It's certainly true that hint-bit updates cost something, but quantifying how much isn't easy. Maybe we can instrument the code with DTrace probes to quantify the actual costs. I'm not familiar with the code, but if I know where to place the probes, I can easily do a quick test and provide the data. The off-the-cuff answer is to do the select count(*) twice and see how much cheaper the second one is. Doesn't seem the second run is cheaper as shown in the results below. The data came from the probes I've added recently. *** Run #1 ** SQL Statement : select count(*) from accounts; Execution time : 1086.58 (ms) Buffer Read Counts Tablespace Database Table Count 1663 16384 1247 1 1663 16384 2600 1 1663 16384 2703 1 1663 16384 1255 2 1663 16384 2650 2 1663 16384 2690 3 1663 16384 2691 3 1663 16384 16397 8390 Dirty Buffer Write Counts = Tablespace Database Table Count 1663 16384 16397 2865 Total buffer cache hits : 1932 Total buffer cache misses: 6471 Average read time from cache : 5638 (ns) Average read time from disk :143371 (ns) Average write time to disk : 20368 (ns) *** Run #2 ** SQL Statement : select count(*) from accounts; Execution time : 1115.94 (ms) Buffer Read Counts Tablespace Database Table Count 1663 16384 16397 8390 Dirty Buffer Write Counts = Tablespace Database Table Count 1663 16384 16397 2865 Total buffer cache hits : 1931 Total buffer cache misses: 6459 Average read time from cache : 4357 (ns) Average read time from disk :154127 (ns) Average write time to disk : 20368 (ns) -Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Tom Lane wrote: Hmm, the problem would be trying to figure out what percentage of writes could be blamed solely on hint-bit updates and not any other change to the page. I don't think that the bufmgr currently keeps enough state to know that, but you could probably modify it easily enough, since callers distinguish MarkBufferDirty from SetBufferCommitInfoNeedsSave. Define another flag bit that's set only by the first, and test it during write-out. Ok, I made a few changes to bufmgr per my understanding of your description above and with my limited understanding of the code. Patch is attached. Assuming the patch is correct, the effect of writes due to hint bits is quite significant. I collected the data below by runing pgbench in one terminal and psql on another to run the query. Is the data plausible? -Robert -- Backend PID: 16189 SQL Statement : select count(*) from accounts; Execution time : 17.33 sec Buffer Read Counts Tablespace Database Table Count 1663 16384 2600 1 1663 16384 2601 1 1663 16384 2615 1 1663 16384 1255 2 1663 16384 2602 2 1663 16384 2603 2 1663 16384 2616 2 1663 16384 2650 2 1663 16384 2678 2 1663 16384 1247 3 1663 16384 1249 3 1663 16384 2610 3 1663 16384 2655 3 1663 16384 2679 3 1663 16384 2684 3 1663 16384 2687 3 1663 16384 2690 3 1663 16384 2691 3 1663 16384 2703 4 1663 16384 1259 5 1663 16384 2653 5 1663 16384 2662 5 1663 16384 2663 5 1663 16384 2659 7 1663 16384 16397 8390 Dirty Buffer Write Counts = Tablespace Database Table Count 1663 16384 16402 2 1663 16384 16394 11 1663 16384 16397 4771 == Hint Bits Write Counts == Tablespace Database Table Count 1663 16384 16397 4508 Total buffer cache hits : 732 Total buffer cache misses: 7731 Average read time from cache : 9136 (ns) Average read time from disk :384201 (ns) Average write time to disk :210709 (ns) Backend PID: 16189 SQL Statement : select count(*) from accounts; Execution time : 12.72 sec Buffer Read Counts Tablespace Database Table Count 1663 16384 16397 8392 Dirty Buffer Write Counts = Tablespace Database Table Count 1663 16384 16394 6 1663 16384 16402 7 1663 16384 16397 2870 == Hint Bits Write Counts == Tablespace Database Table Count 1663 16384 16402 2 1663 16384 16397 2010 Total buffer cache hits : 606 Total buffer cache misses: 7786 Average read time from cache : 6949 (ns) Average read time from disk :706288 (ns) Average write time to disk : 90426 (ns) Index: bufmgr.c === RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.228 diff -u -3 -p -r1.228 bufmgr.c --- bufmgr.c 1 Jan 2008 19:45:51 - 1.228 +++ bufmgr.c 15 May 2008 20:56:38 - @@ -42,6 +42,7 @@ #include storage/smgr.h #include utils/resowner.h #include pgstat.h +#include pg_trace.h /* Note: these two macros only work on shared buffers, not local ones! */ @@ -171,6 +172,7 @@ ReadBuffer_common(Relation reln, BlockNu if (isExtend) blockNum = smgrnblocks(reln-rd_smgr); + TRACE_POSTGRESQL_BUFFER_READ_START(blockNum, reln-rd_node.spcNode, reln-rd_node.dbNode, reln-rd_node.relNode, isLocalBuf); pgstat_count_buffer_read(reln); if (isLocalBuf) @@ -200,12 +202,16 @@ ReadBuffer_common(Relation reln, BlockNu { if (!isExtend) { + TRACE_POSTGRESQL_BUFFER_HIT(); /* Just need to update stats before we exit */ pgstat_count_buffer_hit(reln); if (VacuumCostActive) VacuumCostBalance += VacuumCostPageHit; + TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum, + reln-rd_node.spcNode, reln-rd_node.dbNode, + reln-rd_node.relNode, isLocalBuf, found); return BufferDescriptorGetBuffer(bufHdr); } @@ -257,6 +263,7 @@ ReadBuffer_common(Relation reln, BlockNu } while (!StartBufferIO(bufHdr, true)); } } + TRACE_POSTGRESQL_BUFFER_MISS(); /* * if we have gotten to this
Re: [PERFORM] User concurrency thresholding: where do I look?
Tom Lane wrote: That path would be taking CLogControlLock ... so you're off by at least one. Compare the script to src/include/storage/lwlock.h. Indeed, the indexing was off by one due to the removal of BuffMappingLock in src/include/storage/lwlock.h between 8.1 and 8.2 which was not updated in the DTrace script. Thanks, Robert ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom Lane wrote: Yeah, those seem plausible, although the hold time for CheckpointStartLock seems awfully high --- about 20 msec per transaction. Are you using a nonzero commit_delay? I didn't change commit_delay which defaults to zero. Regards, -Robert ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom Lane wrote: Hmmm ... AFAICS this must mean that flushing the WAL data to disk at transaction commit time takes (most of) 20 msec on your hardware. Which still seems high --- on most modern disks that'd be at least two disk revolutions, maybe more. What's the disk hardware you're testing on, particularly its RPM spec? I actually ran the test on my laptop. It has an Ultra ATA/100 drive (5400 rpm). The test was just a quickie to show some data from the probes. I'll collect and share data from the T2000 server later. Regards, -Robert ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom Lane wrote: Tatsuo Ishii [EMAIL PROTECTED] writes: 18% in s_lock is definitely bad :-(. Were you able to determine which LWLock(s) are accounting for the contention? Sorry for the delay. Finally I got the oprofile data. It's huge(34MB). If you are interested, I can put somewhere. Please let me know. I finally got a chance to look at this, and it seems clear that all the traffic is on the BufMappingLock. This is essentially the same problem we were discussing with respect to Gavin Hamill's report of poor performance on an 8-way IBM PPC64 box (see hackers archives around 2006-04-21). If your database is fully cached in shared buffers, then you can do a whole lot of buffer accesses per unit time, and even though all the BufMappingLock acquisitions are in shared-LWLock mode, the LWLock's spinlock ends up being heavily contended on an SMP box. It's likely that CVS HEAD would show somewhat better performance because of the btree change to cache local copies of index metapages (which eliminates a fair fraction of buffer accesses, at least in Gavin's test case). Getting much further than that seems to require partitioning the buffer mapping table. The last discussion stalled on my concerns about unpredictable shared memory usage, but I have some ideas on that which I'll post separately. In the meantime, thanks for sending along the oprofile data! regards, tom lane I ran pgbench and fired up a DTrace script using the lwlock probes we've added, and it looks like BufMappingLock is the most contended lock, but CheckpointStartLocks are held for longer duration! Lock IdMode Count ControlFileLock Exclusive 1 SubtransControlLock Exclusive 1 BgWriterCommLock Exclusive 6 FreeSpaceLock Exclusive 6 FirstLockMgrLock Exclusive 48 BufFreelistLock Exclusive 74 BufMappingLock Exclusive 74 CLogControlLock Exclusive 184 XidGenLock Exclusive 184 CheckpointStartLock Shared 185 WALWriteLock Exclusive 185 ProcArrayLock Exclusive 368 CLogControlLock Shared 552 SubtransControlLock Shared1273 WALInsertLock Exclusive1476 XidGenLock Shared1842 ProcArrayLock Shared3160 SInvalLock Shared3684 BufMappingLock Shared 14578 Lock Id Combined Time (ns) ControlFileLock 7915 BgWriterCommLock43438 FreeSpaceLock 39 BufFreelistLock 448530 FirstLockMgrLock 2879957 CLogControlLock 4237750 SubtransControlLock 6378042 XidGenLock 9500422 WALInsertLock 16372040 SInvalLock 23284554 ProcArrayLock 32188638 BufMappingLock113128512 WALWriteLock142391501 CheckpointStartLock 4171106665 Regards, -Robert ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom Lane wrote: Those numbers look a bit suspicious --- I'd expect to see some of the LWLocks being taken in both shared and exclusive modes, but you don't show any such cases. You sure your script is counting correctly? I'll double check to make sure no stupid mistakes were made! Also, it'd be interesting to count time spent holding shared lock separately from time spent holding exclusive. Will provide that data later today. Regards, -Robert ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom Lane wrote: Also, it'd be interesting to count time spent holding shared lock separately from time spent holding exclusive. Tom, Here is the break down between exclusive shared LWLocks. Do the numbers look reasonable to you? Regards, -Robert bash-3.00# time ./Tom_lwlock_acquire.d `pgrep -n postgres` ** LWLock Count: Exclusive ** Lock IdMode Count ControlFileLock Exclusive 1 FreeSpaceLock Exclusive 9 XidGenLock Exclusive 202 CLogControlLock Exclusive 203 WALWriteLock Exclusive 203 BgWriterCommLock Exclusive 222 BufFreelistLock Exclusive 305 BufMappingLock Exclusive 305 ProcArrayLock Exclusive 405 FirstLockMgrLock Exclusive 670 WALInsertLock Exclusive1616 ** LWLock Count: Shared ** Lock IdMode Count CheckpointStartLock Shared 202 CLogControlLock Shared 450 SubtransControlLock Shared 776 XidGenLock Shared2020 ProcArrayLock Shared3778 SInvalLock Shared4040 BufMappingLock Shared 40838 ** LWLock Time: Exclusive ** Lock Id Combined Time (ns) ControlFileLock 8301 FreeSpaceLock80590 CLogControlLock 1603557 BgWriterCommLock 1607122 BufFreelistLock 1997406 XidGenLock 2312442 BufMappingLock 3161683 FirstLockMgrLock 5392575 ProcArrayLock 6034396 WALInsertLock 12277693 WALWriteLock324869744 ** LWLock Time: Shared ** Lock Id Combined Time (ns) CLogControlLock 3183788 SubtransControlLock 6956229 XidGenLock 12012576 SInvalLock 35567976 ProcArrayLock 45400779 BufMappingLock300669441 CheckpointStartLock 4056134243 real0m24.718s user0m0.382s sys 0m0.181s ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community
I am thrill to inform you all that Sun has just donated a fully loaded T2000 system to the PostgreSQL community, and it's being setup by Corey Shields at OSL (osuosl.org) and should be online probably early next week. The system has * 8 cores, 4 hw threads/core @ 1.2 GHz. Solaris sees the system as having 32 virtual CPUs, and each can be enabled or disabled individually * 32 GB of DDR2 SDRAM memory * 2 @ 73GB internal SAS drives (1 RPM) * 4 Gigabit ethernet ports For complete spec, visit http://www.sun.com/servers/coolthreads/t2000/specifications.jsp I think this system is well suited for PG scalability testing, among others. We did an informal test using an internal OLTP benchmark and noticed that PG can scale to around 8 CPUs. Would be really cool if all 32 virtual CPUs can be utilized!!! Anyways, if you need to access the system for testing purposes, please contact Josh Berkus. Regards, Robert Lor Sun Microsystems, Inc. 01-510-574-7189 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community
Arjen van der Meijden wrote: I can already confirm very good scalability (with our workload) on postgresql on that machine. We've been testing a 32thread/16G-version and it shows near-linear scaling when enabling 1, 2, 4, 6 and 8 cores (with all four threads enabled). The threads are a bit less scalable, but still pretty good. Enabling 1, 2 or 4 threads for each core yields resp 60 and 130% extra performance. Wow, what type of workload is it? And did you do much tuning to get near-linear scalability to 32 threads? Regards, -Robert ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] bad performance on Solaris 10
Bruce Momjian wrote On 04/13/06 01:39 AM,: Yes, if someone wants to give us a clear answer on which wal_sync method is best on all versions of Solaris, we can easily make that change. We're doing tests to see how various parameters in postgresql.conf affect performance on Solaris and will share the results shortly. Regards, -Robert ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] bad performance on Solaris 10
Chris Mair wrote: Ok, so I did a few runs for each of the sync methods, keeping all the rest constant and got this: open_datasync 0.7 fdatasync 4.6 fsync 4.5 fsync_writethrough not supported open_sync 0.6 in arbitrary units - higher is faster. Quite impressive! Chris, Just to make sure the x4100 config is similar to your Linux system, can you verify the default setting for disk write cache and make sure they are both enabled or disabled. Here's how to check in Solaris. As root, run format -e - pick a disk - cache - write_cache - display Not sure how to do it on Linux though! Regards, -Robert ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Sun Fire T2000 and PostgreSQL 8.1.3
Tom is right. Unless your workload can generate lots of simultaneous queries, you will not reap the full benefit of the Sun Fire T2000 system. I have tested 8.1.3 with an OLTP workload on an 8 cores system. With 1500-2000 client connections, the CPU was only about 30% utilized. The UltraSPARC T1 processor was designed for throughput with many cores running at lower frequency (1-1.2 GHz) to reduce power consumption. To speed up a single big query, you'd be better off with a parallelize DB or an Opteron system with higher clock speed like this one http://www.sun.com/servers/entry/x4200/ Regards, -Robert Tom Lane wrote: Juan Casero \(FL FLC\) [EMAIL PROTECTED] writes: ... This box has a single Ultrasparc T1 cpu with six execution piplelines that can each handle 4 threads. With the Unix top utility the postgresql server appears to bounce around between the available threads on the system. Try sending it more than one query at a time? If you're testing with just one client connection issuing queries, that's about what I'd expect. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] PostgreSQL Solaris packages now in beta
With big thanks to Josh Berkus and Devrim Gunduz, I'm happy to announce that Sun has just released a Solaris distribution of PostgreSQL 8.1.2 with ready-to-install packages for both Sparc and x86. These packages are currently in Beta, and we expect to FCS in 2 -3 weeks. The packages, along with an install guide, are available for download at http://pgfoundry.org/projects/solarispackages/ We have tightly integrated PostgreSQL with Solaris in a manner similar to the Linux distributions available on postgresql.org. In fact, the directory structures are identical. Starting with Solaris 10 Update 2, PostgreSQL will be distributed with every copy of Solaris, via download and physical media. We welcome any and all feedback on this PostgreSQL Solaris distribution. Please subscribe to the [EMAIL PROTECTED] mailing list to give us feedback: http://pgfoundry.org/mail/?group_id=163 BTW, I'm a senior engineer at Sun Microsystems, recently working with the PostgreSQL community (namely Josh Berkus, Devrim Gunduz, and Gavin Sherry) on the Solaris Packages Project at PgFoundry, PostgreSQL performance optimization on Solaris, and leveraging Solaris 10 capabilities (e.g. DTrace) specifically for PostgreSQL. I'll be posting a Solaris performance tuning guide in a few weeks. Regards, Robert Lor ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly