Re: [HACKERS] testing ProcArrayLock patches
Pavan Deolasee wrote: > The numbers are not that bad, but definitely not as good as we saw > on some other platforms. Well, this machine is definitely designed to hold up under high concurrency. As I understand it, each core is the memory manager for two 4GB DIMMs, with two channels to them, each with two buffers. The way the cores are connected, a core never needs to go through more than one other core to get to memory not directly managed, and that uses "snoop" technology which hands the cached data right over from one core to the other when possible, rather than making the core which now owns the cache line pull it from RAM. It seems the 2.6.32 kernel is able to manage that technology in a reasonable fashion. At first I was surprised to see performance top out on the update tests between 80 and 96 clients. But then, that lands almost exactly where my old reliable ((2 * core count) + effective spindle count) would predict. The SELECT only tests peaked at 64 clients, but those were fully cached, so effective spindle count was zero, again fitting the formula. So these optimizations seem to me to break down the barriers which had previously capped the number of clients which could be handled, letting them peak at their "natural" levels. > But its possible that they may improve in percentage terms with > even more number of clients on this box. I think so; I think this box is just so scalable that at 128 clients we were just barely getting past the "knee" in the performance graphs to where these patches help most. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Tue, Nov 22, 2011 at 4:40 AM, Kevin Grittner wrote: > Pavan Deolasee wrote: > >> It will be a great help if you could spare few minutes to also >> test the patch to take out the frequently accessed PGPROC members >> to a different array. We are seeing good improvements on HPUX IA >> platform and the AMD Opteron and it will be interesting to know >> what happens on the Intel platform too. > > For a read only comparison (which was run using the simple > protocol), using identical settings to the previous master run, but > with the PGPROC split patch: > > m32 tps = 201738.209348 (including connections establishing) > p32 tps = 201620.966988 (including connections establishing) > > m128 tps = 352159.631878 (including connections establishing) > p128 tps = 363998.703900 (including connections establishing) > > Clearly a win at 128 clients; not at 32. > > For updates: > > sm32 tps = 27392.393850 (including connections establishing) > sp32 tps = 27995.784333 (including connections establishing) > > sm128 tps = 22261.902571 (including connections establishing) > sp128 tps = 23690.408272 (including connections establishing) > > pm32 tps = 34983.352396 (including connections establishing) > pp32 tps = 36076.373389 (including connections establishing) > > pm128 tps = 24164.441954 (including connections establishing) > pp128 tps = 27070.824588 (including connections establishing) > > That's a pretty decisive win all around. > Thanks for running those tests. The numbers are not that bad, but definitely not as good as we saw on some other platforms. But its possible that they may improve in percentage terms with even more number of clients on this box. And given that we are seeing big gains on other platforms, hopefully it will give us confident to proceed with the patch. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Pavan Deolasee wrote: > It will be a great help if you could spare few minutes to also > test the patch to take out the frequently accessed PGPROC members > to a different array. We are seeing good improvements on HPUX IA > platform and the AMD Opteron and it will be interesting to know > what happens on the Intel platform too. For a read only comparison (which was run using the simple protocol), using identical settings to the previous master run, but with the PGPROC split patch: m32 tps = 201738.209348 (including connections establishing) p32 tps = 201620.966988 (including connections establishing) m128 tps = 352159.631878 (including connections establishing) p128 tps = 363998.703900 (including connections establishing) Clearly a win at 128 clients; not at 32. For updates: sm32 tps = 27392.393850 (including connections establishing) sp32 tps = 27995.784333 (including connections establishing) sm128 tps = 22261.902571 (including connections establishing) sp128 tps = 23690.408272 (including connections establishing) pm32 tps = 34983.352396 (including connections establishing) pp32 tps = 36076.373389 (including connections establishing) pm128 tps = 24164.441954 (including connections establishing) pp128 tps = 27070.824588 (including connections establishing) That's a pretty decisive win all around. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Mon, Nov 21, 2011 at 11:01 PM, Kevin Grittner wrote: > Pavan Deolasee wrote: > >> It will be a great help if you could spare few minutes to also >> test the patch to take out the frequently accessed PGPROC members >> to a different array. We are seeing good improvements on HPUX IA >> platform and the AMD Opteron and it will be interesting to know >> what happens on the Intel platform too. >> >> > http://archives.postgresql.org/message-id/4eb7c4c9.9070...@enterprisedb.com > > It's going to be hard to arrange more of the 20-hours runs I've been > doing, but I can work in some more abbreviated tests. What would be > the best test for this? (I would hate to try and find out I didn't > exercise the right code path.) > I think 2-3 runs with 32 and 128 clients each with prepared statements should suffice to quickly compare with the other numbers you posted for the master. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Pavan Deolasee wrote: > It will be a great help if you could spare few minutes to also > test the patch to take out the frequently accessed PGPROC members > to a different array. We are seeing good improvements on HPUX IA > platform and the AMD Opteron and it will be interesting to know > what happens on the Intel platform too. > > http://archives.postgresql.org/message-id/4eb7c4c9.9070...@enterprisedb.com It's going to be hard to arrange more of the 20-hours runs I've been doing, but I can work in some more abbreviated tests. What would be the best test for this? (I would hate to try and find out I didn't exercise the right code path.) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Mon, Nov 21, 2011 at 10:44 PM, Kevin Grittner wrote: > "Kevin Grittner" wrote: > >> I can run one more set of tests tonight before I have to give it >> back to the guy who's putting it into production. It sounds like >> a set like the above except with synchronous_commit = off might be >> desirable? > > OK, that's what I did. This gave me my best numbers yet for an > updating run of pgbench: tps = 38039.724212 for prepared statements > using the flexlock patch. This patch is a clear win when you get to > 16 clients or more. > It will be a great help if you could spare few minutes to also test the patch to take out the frequently accessed PGPROC members to a different array. We are seeing good improvements on HPUX IA platform and the AMD Opteron and it will be interesting to know what happens on the Intel platform too. http://archives.postgresql.org/message-id/4eb7c4c9.9070...@enterprisedb.com Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
"Kevin Grittner" wrote: > I can run one more set of tests tonight before I have to give it > back to the guy who's putting it into production. It sounds like > a set like the above except with synchronous_commit = off might be > desirable? OK, that's what I did. This gave me my best numbers yet for an updating run of pgbench: tps = 38039.724212 for prepared statements using the flexlock patch. This patch is a clear win when you get to 16 clients or more. sm1 tps = 1312.501168 (including connections establishing) sf1 tps = 1376.678293 (including connections establishing) sm2 tps = 2705.571856 (including connections establishing) sf2 tps = 2689.577938 (including connections establishing) sm4 tps = 5461.403557 (including connections establishing) sf4 tps = 5447.363103 (including connections establishing) sm8 tps = 10524.695338 (including connections establishing) sf8 tps = 10448.012069 (including connections establishing) sm16 tps = 18952.968472 (including connections establishing) sf16 tps = 18969.505631 (including connections establishing) sm32 tps = 27392.393850 (including connections establishing) sf32 tps = 29225.974112 (including connections establishing) sm64 tps = 28947.675549 (including connections establishing) sf64 tps = 31417.536816 (including connections establishing) sm80 tps = 28053.684182 (including connections establishing) sf80 tps = 29970.555401 (including connections establishing) sm96 tps = 25885.679957 (including connections establishing) sf96 tps = 28581.271436 (including connections establishing) sm128 tps = 22261.902571 (including connections establishing) sf128 tps = 24537.566960 (including connections establishing) pm1 tps = 2082.958841 (including connections establishing) pf1 tps = 2052.328339 (including connections establishing) pm2 tps = 4287.257860 (including connections establishing) pf2 tps = 4228.770795 (including connections establishing) pm4 tps = 8653.196863 (including connections establishing) pf4 tps = 8592.091631 (including connections establishing) pm8 tps = 16071.432101 (including connections establishing) pf8 tps = 16196.992207 (including connections establishing) pm16 tps = 27146.441216 (including connections establishing) pf16 tps = 27441.966562 (including connections establishing) pm32 tps = 34983.352396 (including connections establishing) pf32 tps = 38039.724212 (including connections establishing) pm64 tps = 33182.643501 (including connections establishing) pf64 tps = 34193.732669 (including connections establishing) pm80 tps = 30686.712607 (including connections establishing) pf80 tps = 6.011769 (including connections establishing) pm96 tps = 24692.015615 (including connections establishing) pf96 tps = 32907.472665 (including connections establishing) pm128 tps = 24164.441954 (including connections establishing) pf128 tps = 25742.670928 (including connections establishing) At lower client numbers the tps values within each set of five samples were very tightly grouped. With either protocol, and whether or not the patch was applied, the higher concurrency groups tended to be bifurcated within a set of five samples between "good" and "bad" numbers. The patch seemed to increase the number of clients which could be handled without collapse into the bad numbers. It really looks like there's some sort of performance "collapse" at higher concurrency which may or may not happen in any particular five minute run. Just as one example, running the simple protocol with the flexlock patch: tps = 24491.653873 (including connections establishing) tps = 24537.566960 (including connections establishing) tps = 28462.276323 (including connections establishing) tps = 24403.373002 (including connections establishing) tps = 28458.902549 (including connections establishing) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: > I was actually thinking it would be interesting to oprofile the > read-only test; see if we can figure out where those slowdowns are > coming from. CPU: Intel Core/i7, speed 2262 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 10 samples %image name symbol name 3124242 5.7137 postgress_lock 254 4.6737 postgresAllocSetAlloc 2403412 4.3954 postgresGetSnapshotData 1967132 3.5975 postgresSearchCatCache 1872176 3.4239 postgresbase_yyparse 1327256 2.4273 postgreshash_search_with_hash_value 1040131 1.9022 postgres_bt_compare 1038976 1.9001 postgresLWLockAcquire 8171221.4944 postgresMemoryContextAllocZeroAligned 7383211.3503 postgrescore_yylex 6226131.1386 postgresMemoryContextAlloc 5970541.0919 postgresPinBuffer 5561381.0171 postgresScanKeywordLookup 5523181.0101 postgresexpression_tree_walker 4942790.9039 postgresLWLockRelease 4886280.8936 postgreshash_any 4729060.8649 postgresnocachegetattr 3964820.7251 postgresgrouping_planner 3829740.7004 postgresLockAcquireExtended 3751860.6861 postgresAllocSetFree 3750720.6859 postgresProcArrayLockRelease 3736680.6834 postgresnew_list 3659170.6692 postgresfmgr_info_cxt_security 3013980.5512 postgresProcArrayLockAcquire 3006470.5498 postgresLockReleaseAll 2920730.5341 postgresDirectFunctionCall1Coll 2857450.5226 postgresMemoryContextAllocZero 2846840.5206 postgresFunctionCall2Coll 2827010.5170 postgresSearchSysCache max_connections = 100 max_pred_locks_per_transaction = 64 shared_buffers = 8GB maintenance_work_mem = 1GB checkpoint_segments = 300 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 wal_writer_delay = 20ms seq_page_cost = 0.1 random_page_cost = 0.1 cpu_tuple_cost = 0.05 effective_cache_size = 40GB default_transaction_isolation = '$iso' pgbench -i -s 100 pgbench -S -M simple -T 300 -c 80 -j 80 transaction type: SELECT only scaling factor: 100 query mode: simple number of clients: 80 number of threads: 80 duration: 300 s number of transactions actually processed: 104391011 tps = 347964.636256 (including connections establishing) tps = 347976.389034 (excluding connections establishing) vmstat 1 showed differently this time -- no clue why. procs --memory- ---swap-- -io r b swpdfree buffcache si sobibo ---system -cpu-- in cs us sy id wa st 91 0 8196 4189436 203925700 5231449200 0 0 32255 1522807 85 13 1 0 0 92 0 8196 4189404 203925700 5231449200 0 0 32796 1525463 85 14 1 0 0 67 0 8196 4189404 203925700 5231448800 0 0 32343 1527988 85 13 1 0 0 93 0 8196 4189404 203925700 5231448800 0 0 32701 1535827 85 13 1 0 0 -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: > Hmm. There's obviously something that's different in your > environment or configuration from what I tested, but I don't know > what it is. The fact that your scale factor is larger than > shared_buffers might matter; or Intel vs. AMD. Or maybe you're > running with synchronous_commit=on? Yes, I had synchronous_commit = on for these runs. Here are the settings: cat >> $PGDATA/postgresql.conf
Re: [HACKERS] testing ProcArrayLock patches
On Saturday, November 19, 2011 12:18:07 AM Kevin Grittner wrote: > Andres Freund wrote: > > I think opannotate -a -s produces output with instructions/code > > intermingled. > > Thanks. I'll check out perf later (thanks for the tips!), but for > now, here's the function which was at the top of my oprofile > results, annotated with those options. I'm afraid it's a bit > intimidating to me -- the last time I did much with X86 assembly > language was in the mid-80s, on an 80286. :-/ While my assembly knoweldge surely isn't from the 80s be assured that I find it intimidating as well ;) > Hopefully, since > this is at the top of the oprofile results when running with > prepared statements, it will be of use to somebody. I think in quite many situations hash_search_with_hash_value is rather noticeable in the profiles. Even without concurrency... Looking at your annotation output the code seems to be almost entirely stalled waiting for memory. The first stall is after the first reading memory access which is likely to be uncached (the first cacheline of the HTAB is accessed before but that will be in the cache). The interesting thing is that I would have expected a higher likelihood for this to stay in the cache. 2225 0.0165 : 70b543: mov(%rdi),%r15 :static inline uint32 :calc_bucket(HASHHDR *hctl, uint32 hash_val) :{ :uint32 bucket; : :bucket = hash_val & hctl->high_mask; 4544 0.0337 : 70b546: and0x2c(%r15),%ebx :if (bucket > hctl->max_bucket) 53409 0.3958 : 70b54a: cmp0x28(%r15),%ebx : 70b54e: jbe70b554 So a stall here is not that surprising. Here we fetch data from memory which is unlikely to be prefetchable and then require the result from that fetch. Note how segp = hashp->dir[segment_num]; is distributed over line 52, 64, 83. :segp = hashp->dir[segment_num]; 2062 0.0153 : 70b562: shr%cl,%eax 309 0.0023 : 70b564: mov%eax,%eax 643 0.0048 : 70b566: mov(%rdx,%rax,8),%rbp : :if (segp == NULL) 43329 0.3211 : 70b56a: test %rbp,%rbp The next cacheline is referenced here. Again a fetch from memory which is soon after needed to continue. Unless I misunderstood the code-flow this disproves my theory that we might have many collisions as that test seems to be outside the test ( :prevBucketPtr = &segp[segment_ndx]; :currBucket = *prevBucketPtr; 122 9.0e-04 : 70b586: mov0x0(%rbp),%rbx : :/* : * Follow collision chain looking for matching key : */ :match = hashp->match; /* save one fetch in inner loop */ :keysize = hashp->keysize; /* ditto */ 99903 0.7404 : 70b58a: mov%rax,0x18(%rsp) : :while (currBucket != NULL) 1066 0.0079 : 70b58f: test %rbx,%rbx line 136 is the first time the contents of the current bucket is needed. Thats why the test is so noticeable. :currBucket = *prevBucketPtr; 655 0.0049 : 70b5a3: mov(%rbx),%rbx : * Follow collision chain looking for matching key : */ :match = hashp->match; /* save one fetch in inner loop */ :keysize = hashp->keysize; /* ditto */ : :while (currBucket != NULL) 608 0.0045 : 70b5a6: test %rbx,%rbx : 70b5a9: je 70b5d0 :{ :if (currBucket->hashvalue == hashvalue && 3504 0.0260 : 70b5ab: cmp%r12d,0x8(%rbx) 98486 0.7299 : 70b5af: nop 1233 0.0091 : 70b5b0: jne70b5a0 That covers all the slow points in the function. And unless I am missing something those are all the fetched cachelines of that function... For HASH_FIND that is. So I think that reinforces my belive that ordinary cachemisses are the culprit here. Which is to be excepted in a hashtable... Andres PS: No idea whether that rambling made sense to anyone... But I looked at that function fo the first time ;) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Fri, Nov 18, 2011 at 6:46 PM, Kevin Grittner wrote: >>> tps = 21946.961196 (including connections establishing) >>> tps = 22911.873227 (including connections establishing) >>> >>> For write transactions, that seems pretty respectable. >> >> Very. What do you get without the patch? > > [quick runs a couple tests that way] > > Single run with -M simple: > > tps = 23018.314292 (including connections establishing) > > Single run with -M prepared: > > tps = 27910.621044 (including connections establishing) > > So, the patch appears to hinder performance in this environment, > although certainty is quite low with so few samples. I'll schedule > a spectrum of runs before I leave this evening (very soon). Hmm. There's obviously something that's different in your environment or configuration from what I tested, but I don't know what it is. The fact that your scale factor is larger than shared_buffers might matter; or Intel vs. AMD. Or maybe you're running with synchronous_commit=on? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: > Hmm. That looks a lot like a profile with no lock contention at > all. Since I see XLogInsert in there, I assume this must be a > pgbench write test on unlogged tables? How close am I? Not unless pgbench on HEAD does that by default. Here are the relevant statements: $prefix/bin/pgbench -i -s 150 $prefix/bin/pgbench -T $time -c $clients -j $clients >>$resultfile Perhaps the Intel cores implement the relevant primitives better? Maybe I didn't run the profile or reports the right way? > I was actually thinking it would be interesting to oprofile the > read-only test; see if we can figure out where those slowdowns are > coming from. I'll plan on doing that this weekend. >> tps = 21946.961196 (including connections establishing) >> tps = 22911.873227 (including connections establishing) >> >> For write transactions, that seems pretty respectable. > > Very. What do you get without the patch? [quick runs a couple tests that way] Single run with -M simple: tps = 23018.314292 (including connections establishing) Single run with -M prepared: tps = 27910.621044 (including connections establishing) So, the patch appears to hinder performance in this environment, although certainty is quite low with so few samples. I'll schedule a spectrum of runs before I leave this evening (very soon). -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Fri, Nov 18, 2011 at 2:05 PM, Kevin Grittner wrote: > Robert Haas wrote: >> Any chance you can run oprofile (on either branch, don't really >> care) against the 32 client test and post the results? > > [ oprofile results ] Hmm. That looks a lot like a profile with no lock contention at all. Since I see XLogInsert in there, I assume this must be a pgbench write test on unlogged tables? How close am I? I was actually thinking it would be interesting to oprofile the read-only test; see if we can figure out where those slowdowns are coming from. > Two runs: > > tps = 21946.961196 (including connections establishing) > tps = 22911.873227 (including connections establishing) > > For write transactions, that seems pretty respectable. Very. What do you get without the patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Andres Freund wrote: > I think opannotate -a -s produces output with instructions/code > intermingled. Thanks. I'll check out perf later (thanks for the tips!), but for now, here's the function which was at the top of my oprofile results, annotated with those options. I'm afraid it's a bit intimidating to me -- the last time I did much with X86 assembly language was in the mid-80s, on an 80286. :-/ Hopefully, since this is at the top of the oprofile results when running with prepared statements, it will be of use to somebody. The instructions which are shown as having that 1% still seem odd to me, but as you say, they were probably actually waiting for some previous operation to finish: 43329 0.3211 : 70b56a: test %rbp,%rbp 99903 0.7404 : 70b58a: mov%rax,0x18(%rsp) If anyone wants any other detail from what I captured, let me know. -Kevin 0070b520 : /* hash_search_with_hash_value total: 495463 3.6718 */ :hash_search_with_hash_value(HTAB *hashp, :const void *keyPtr, :uint32 hashvalue, : HASHACTION action, :bool *foundPtr) :{ 5023 0.0372 : 70b520: push %r15 5967 0.0442 : 70b522: push %r14 1407 0.0104 : 70b524: mov%rdi,%r14 30 2.2e-04 : 70b527: push %r13 2495 0.0185 : 70b529: push %r12 2631 0.0195 : 70b52b: mov%edx,%r12d 18 1.3e-04 : 70b52e: push %rbp 1277 0.0095 : 70b52f: push %rbx :static inline uint32 :calc_bucket(HASHHDR *hctl, uint32 hash_val) :{ :uint32 bucket; : :bucket = hash_val & hctl->high_mask; 2122 0.0157 : 70b530: mov%edx,%ebx :hash_search_with_hash_value(HTAB *hashp, :const void *keyPtr, :uint32 hashvalue, : HASHACTION action, :bool *foundPtr) :{ 247 0.0018 : 70b532: sub$0x58,%rsp 236 0.0017 : 70b536: mov%rsi,0x10(%rsp) 3851 0.0285 : 70b53b: mov%ecx,0xc(%rsp) 2551 0.0189 : 70b53f: mov%r8,(%rsp) :HASHHDR*hctl = hashp->hctl; 2225 0.0165 : 70b543: mov(%rdi),%r15 :static inline uint32 :calc_bucket(HASHHDR *hctl, uint32 hash_val) :{ :uint32 bucket; : :bucket = hash_val & hctl->high_mask; 4544 0.0337 : 70b546: and0x2c(%r15),%ebx :if (bucket > hctl->max_bucket) 53409 0.3958 : 70b54a: cmp0x28(%r15),%ebx : 70b54e: jbe70b554 :bucket = bucket & hctl->low_mask; 3324 0.0246 : 70b550: and0x30(%r15),%ebx :bucket = calc_bucket(hctl, hashvalue); : :segment_num = bucket >> hashp->sshift; :segment_ndx = MOD(bucket, hashp->ssize); : :segp = hashp->dir[segment_num]; 9702 0.0719 : 70b554: mov0x58(%r14),%ecx 2428 0.0180 : 70b558: mov%ebx,%eax 489 0.0036 : 70b55a: mov0x8(%r14),%rdx : * Do the initial lookup : */ :bucket = calc_bucket(hctl, hashvalue); : :segment_num = bucket >> hashp->sshift; :segment_ndx = MOD(bucket, hashp->ssize); 391 0.0029 : 70b55e: mov0x50(%r14),%r13 : :segp = hashp->dir[segment_num]; 2062 0.0153 : 70b562: shr%cl,%eax 309 0.0023 : 70b564: mov%eax,%eax 643 0.0048 : 70b566: mov(%rdx,%rax,8),%rbp : :if (segp == NULL) 43329 0.3211 : 70b56a: test %rbp,%rbp 1284 0.0095 : 70b56d: je 70b727 :hash_corrupted(hashp); : :prevBucketPtr = &segp[segment_ndx]; 1878 0.0139 : 70b573: lea-0x1(%r13),%rax :currBucket = *prevBucketPtr; : :/* : * Follow collision chain looking for matching key : */ :match = hashp->match;
Re: [HACKERS] testing ProcArrayLock patches
On Friday, November 18, 2011 11:12:02 PM Andres Freund wrote: > On Friday, November 18, 2011 09:16:01 PM Kevin Grittner wrote: > > Andres Freund wrote: > > > When doing line-level profiles I would suggest looking at the > > > instructions. > > > > What's the best way to do that? > > I think opannotate -a -s produces output with instructions/code > intermingled. > > > > I don't think cache line contention is the most likely candidate > > > here. Simple cache-misses seem far more likely. In combination > > > with pipeline stalls... > > > > > > Newer cpus (nehalem+) can measure stalled cycles which can be > > > really useful when analyzing performance. I don't remember how to > > > do that with oprofile right now though as I use perf these days > > > (its -e stalled-cycles{frontend|backend} there}). > > > > When I run oprofile, I still always go back to this post by Tom: > > http://archives.postgresql.org/pgsql-performance/2009-06/msg00154.php > > Hrm. I am on the train and for unknown reasons the only sensible working > protocols are smtp + pop Waiting Waiting > Sorry, too slow/high latency atm. I wrote everything below and another mail > and the page still hasn't loaded. > > oprofile can produces graphes as well (--callgraph). for both tools you > need -fno-omit-frame-pointers to get usable graphs. > > > Can anyone provide such a "cheat sheet" for perf? I could give that > > a try if I knew how. > > Unfortunately for sensible results the kernel needs to be rather new. > I would say > 2.6.28 or so (just guessed). > > # to record activity > perf record [-g|--call-graph] program|-p pid > > # to view a summation > perf report > # get heaps of stats from something > perf stat -ddd someprogram|-p pid > # show whats the system executing overall > perf top -az > > # get help > perf help (record|report|annotate|stat|...) > ... I forgot that there is also # get a list of event types perf list # measure somethign for a specidif event perf (record|stat|top) -e some_event_type Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Friday, November 18, 2011 09:16:01 PM Kevin Grittner wrote: > Andres Freund wrote: > > When doing line-level profiles I would suggest looking at the > > instructions. > What's the best way to do that? I think opannotate -a -s produces output with instructions/code intermingled. > > I don't think cache line contention is the most likely candidate > > here. Simple cache-misses seem far more likely. In combination > > with pipeline stalls... > > > > Newer cpus (nehalem+) can measure stalled cycles which can be > > really useful when analyzing performance. I don't remember how to > > do that with oprofile right now though as I use perf these days > > (its -e stalled-cycles{frontend|backend} there}). > > When I run oprofile, I still always go back to this post by Tom: > http://archives.postgresql.org/pgsql-performance/2009-06/msg00154.php Hrm. I am on the train and for unknown reasons the only sensible working protocols are smtp + pop Waiting Waiting Sorry, too slow/high latency atm. I wrote everything below and another mail and the page still hasn't loaded. oprofile can produces graphes as well (--callgraph). for both tools you need -fno-omit-frame-pointers to get usable graphs. > Can anyone provide such a "cheat sheet" for perf? I could give that > a try if I knew how. Unfortunately for sensible results the kernel needs to be rather new. I would say > 2.6.28 or so (just guessed). # to record activity perf record [-g|--call-graph] program|-p pid # to view a summation perf report graph: # Overhead Command Shared Object Symbol # . . # 4.09% postgres postgres [.] slab_alloc_dyn | --- slab_alloc_dyn | |--18.52%-- new_list | | | |--63.79%-- lappend | | | | | |--13.40%-- find_usable_indexes | | | create_index_paths | | | set_rel_pathlist | | | make_one_rel flat: # Overhead Command Shared Object Symbol # . . # 5.10% postgres [vdso] [.] 0x73d8d770 4.26% postgres postgres [.] base_yyparse 3.88% postgres postgres [.] slab_alloc_dyn 2.82% postgres postgres [.] core_yylex 2.37% postgres postgres [.] SearchCatCache 1.85% postgres libc-2.13.so [.] __memcpy_ssse3 1.66% postgres libc-2.13.so [.] __GI___strcmp_ssse3 1.23% postgres postgres [.] MemoryContextAlloc # to view a line/source/instruction level view perf annotate -l symbol ... : : /* : * one-time startup overhead for each cache : */ : if (cache->cc_tupdesc == NULL) 0.35 :6e81fd: 48 83 7f 28 00 cmpq $0x0,0x28(%rdi) /home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1070 4.15 :6e8202: 0f 84 54 04 00 00 je 6e865c : #endif : : /* : * initialize the search key information : */ : memcpy(cur_skey, cache->cc_skey, sizeof(cur_skey)); 0.00 :6e8208: 48 8d bd a0 fe ff fflea-0x160(%rbp),%rdi 0.17 :6e820f: 49 8d 77 70 lea0x70(%r15),%rsi 0.00 :6e8213: b9 24 00 00 00 mov$0x24,%ecx /home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1080 33.22 :6e8218: f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi) : cur_skey[0].sk_argument = v1; /home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1081 1.56 :6e821b: 48 89 9d e0 fe ff ffmov%rbx,-0x120(%rbp) ... # get heaps of stats from something perf stat -ddd someprogram|-p pid 1242.409965 task-clock#0.824 CPUs utilized [100.00%] 14,572 context-switches #0.012 M/sec [100.00%] 264 CPU-migrations#0.000 M/sec [100.00%] 0 page-faults #0.000 M/sec 2,854,775,135 cycles#2.298 GHz [26.28%] stalled-cycles-frontend stalled-cycles-backend 2,024,997,785 instructions #0.71 insns per cycle [25.25%] 387,240,903 bran
Re: [HACKERS] testing ProcArrayLock patches
Andres Freund wrote: > When doing line-level profiles I would suggest looking at the > instructions. What's the best way to do that? > I don't think cache line contention is the most likely candidate > here. Simple cache-misses seem far more likely. In combination > with pipeline stalls... > > Newer cpus (nehalem+) can measure stalled cycles which can be > really useful when analyzing performance. I don't remember how to > do that with oprofile right now though as I use perf these days > (its -e stalled-cycles{frontend|backend} there}). When I run oprofile, I still always go back to this post by Tom: http://archives.postgresql.org/pgsql-performance/2009-06/msg00154.php Can anyone provide such a "cheat sheet" for perf? I could give that a try if I knew how. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Friday, November 18, 2011 08:36:59 PM Kevin Grittner wrote: > "Kevin Grittner" wrote: > > samples %image name symbol name > > 4954633.6718 postgreshash_search_with_hash_value > > When lines like these show up in the annotated version, I'm > impressed that we're still finding gains as big as we are: > > 44613 0.3306 :if (segp == NULL) > >:hash_corrupted(hashp); > > 101910 0.7552 :keysize = hashp->keysize; /* ditto */ When doing line-level profiles I would suggest looking at the instructions. Quite often the line shown doesn't have much to do whats executed as the compiler tries to schedule instructions cleverly. Also in many situations the shown cost doesn't actually lie in the instruction shown but in some previous one. The shown instruction e.g. has to wait for the result of the earlier instructions. Pipelining makes that hard to correctly observe. A simplified example would be something like: bool func(int a, int b, int c){ int res = a / b; if(res == c){ return true; } return false; } Likely the instruction showing up in the profile would be the comparison. Which obviously is not the really expensive part... > There goes over 1% of my server run time, right there! > > Of course, these make no sense unless there is cache line > contention, which is why that area is bearing fruit. I don't think cache line contention is the most likely candidate here. Simple cache-misses seem far more likely. In combination with pipeline stalls... Newer cpus (nehalem+) can measure stalled cycles which can be really useful when analyzing performance. I don't remember how to do that with oprofile right now though as I use perf these days (its -e stalled-cycles{frontend|backend} there}). Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: >> I think so. My take was that it was showing 32 of 64 *threads* >> active -- the hyperthreading funkiness. Is there something in >> particular you'd like me to check? > > Not really, just don't understand the number. I'm having trouble resolving the vmstat numbers I got during the 32-client pgbench runs which modified data. -M simple: procs --memory- ---swap-- -io- r b swpd free buffcache si sobi bo system -cpu-- in cs us sy id wa st 30 1 4464 513492 205564572 5447212400 0 78170 621724 1246300 30 8 61 1 0 27 1 4464 509288 205564572 5447460000 0 125620 599403 1192046 29 8 63 1 0 35 1 4464 508368 205564572 5447699600 0 89801 595939 1186496 29 8 63 0 0 25 0 4464 506088 205564572 5447866800 0 90121 594800 1189649 28 8 63 0 0 -M prepared: procs --memory-- ---swap-- -io- r b swpdfree buffcache si sobi bo system -cpu-- in cs us sy id wa st 28 0 5612 1204404 205107344 5423053600 0 93212 527284 1456417 22 9 69 0 0 8 1 5612 1202044 205107344 542600 0 93217 512819 1417457 21 9 70 1 0 17 1 5612 1201892 205107344 5423604800 0 132699 502333 1412878 21 9 70 0 0 19 1 5612 1199208 205107344 5423893600 0 93612 519113 1484386 21 9 69 0 0 So 60% or 70% idle without any I/O wait time. I don't know how to explain that. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
"Kevin Grittner" wrote: > samples %image name symbol name > 4954633.6718 postgreshash_search_with_hash_value When lines like these show up in the annotated version, I'm impressed that we're still finding gains as big as we are: 44613 0.3306 :if (segp == NULL) :hash_corrupted(hashp); 101910 0.7552 :keysize = hashp->keysize; /* ditto */ There goes over 1% of my server run time, right there! Of course, these make no sense unless there is cache line contention, which is why that area is bearing fruit. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
"anara...@anarazel.de" wrote: > Kevin Grittner schrieb: >>samples %image name symbol name >>9333944.9651 postgresAllocSetAlloc >>8484764.5134 postgresbase_yyparse >>7195153.8274 postgresSearchCatCache > That profile looks like you ran pgbench with -m simple. How does > it look with prepared instead? samples %image name symbol name 4954633.6718 postgreshash_search_with_hash_value 4909713.6385 postgresGetSnapshotData 4439653.2902 postgresLWLockAcquire 4435663.2872 postgresAllocSetAlloc 3023882.2409 postgresXLogInsert 2868892.1261 postgresSearchCatCache 2464171.8262 postgresPostgresMain 2350181.7417 postgresheap_page_prune 1984421.4706 postgres_bt_compare 1814461.3447 postgreshash_any 1771311.3127 postgresExecInitExpr 1757751.3026 postgresLWLockRelease 1523241.1288 postgresPinBuffer 1502851.1137 postgresexec_bind_message 1452141.0762 postgresfmgr_info_cxt_security 1404931.0412 postgress_lock 1241620.9201 postgresLockAcquireExtended 1204290.8925 postgresMemoryContextAlloc 1170760.8676 postgrespfree 1164930.8633 postgresAllocSetFree 1050270.7783 postgrespgstat_report_activity 1014070.7515 postgresProcArrayLockAcquire 1007970.7470 postgresMemoryContextAllocZeroAligned 98360 0.7289 postgresProcArrayLockRelease 86938 0.6443 postgresheap_hot_search_buffer 82635 0.6124 postgreshash_search 79902 0.5921 postgreserrstart 79465 0.5889 postgresHeapTupleSatisfiesVacuum 78709 0.5833 postgresResourceOwnerReleaseInternal 76068 0.5637 postgresExecModifyTable 73043 0.5413 postgresheap_update 72175 0.5349 postgresstrlcpy 71253 0.5280 postgresMemoryContextAllocZero tps = 27392.219364 (including connections establishing) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Kevin Grittner schrieb: >Robert Haas wrote: > >> Any chance you can run oprofile (on either branch, don't really >> care) against the 32 client test and post the results? > >Besides the other changes we discussed, I boosted scale to 150 and >ran at READ COMMITTED isolation level (because all threads promptly >crashed and burned at REPEATABLE READ -- we desperately need a >pgbench option to retry a transaction on serialization failure). >The oprofile hot spots at half a percent or higher: > >CPU: Intel Core/i7, speed 2262 MHz (estimated) >Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with >a unit mask of 0x00 (No unit mask) count 10 >samples %image name symbol name >9333944.9651 postgresAllocSetAlloc >8484764.5134 postgresbase_yyparse >7195153.8274 postgresSearchCatCache >4612752.4537 postgreshash_search_with_hash_value >4264112.2682 postgresGetSnapshotData >3229381.7178 postgresLWLockAcquire >3222361.7141 postgrescore_yylex >3054711.6249 postgresMemoryContextAllocZeroAligned >2815431.4976 postgresexpression_tree_walker >2702411.4375 postgresXLogInsert >2348991.2495 postgresMemoryContextAlloc >2101371.1178 postgresScanKeywordLookup >1848570.9833 postgresheap_page_prune >1736080.9235 postgreshash_any >1530110.8139 postgres_bt_compare >1445380.7689 postgresnocachegetattr >1314660.6993 postgresfmgr_info_cxt_security >1310010.6968 postgresgrouping_planner >1308080.6958 postgresLWLockRelease >1241120.6602 postgresPinBuffer >1207450.6423 postgresLockAcquireExtended >1129920.6010 postgresExecInitExpr >1128300.6002 postgreslappend >1123110.5974 postgresnew_list >1103680.5871 postgrescheck_stack_depth >1060360.5640 postgresAllocSetFree >1025650.5456 postgresMemoryContextAllocZero >94689 0.5037 postgresSearchSysCache That profile looks like you ran pgbench with -m simple. How does it look with prepared instead? Andres >-- >Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: > Any chance you can run oprofile (on either branch, don't really > care) against the 32 client test and post the results? Besides the other changes we discussed, I boosted scale to 150 and ran at READ COMMITTED isolation level (because all threads promptly crashed and burned at REPEATABLE READ -- we desperately need a pgbench option to retry a transaction on serialization failure). The oprofile hot spots at half a percent or higher: CPU: Intel Core/i7, speed 2262 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 10 samples %image name symbol name 9333944.9651 postgresAllocSetAlloc 8484764.5134 postgresbase_yyparse 7195153.8274 postgresSearchCatCache 4612752.4537 postgreshash_search_with_hash_value 4264112.2682 postgresGetSnapshotData 3229381.7178 postgresLWLockAcquire 3222361.7141 postgrescore_yylex 3054711.6249 postgresMemoryContextAllocZeroAligned 2815431.4976 postgresexpression_tree_walker 2702411.4375 postgresXLogInsert 2348991.2495 postgresMemoryContextAlloc 2101371.1178 postgresScanKeywordLookup 1848570.9833 postgresheap_page_prune 1736080.9235 postgreshash_any 1530110.8139 postgres_bt_compare 1445380.7689 postgresnocachegetattr 1314660.6993 postgresfmgr_info_cxt_security 1310010.6968 postgresgrouping_planner 1308080.6958 postgresLWLockRelease 1241120.6602 postgresPinBuffer 1207450.6423 postgresLockAcquireExtended 1129920.6010 postgresExecInitExpr 1128300.6002 postgreslappend 1123110.5974 postgresnew_list 1103680.5871 postgrescheck_stack_depth 1060360.5640 postgresAllocSetFree 1025650.5456 postgresMemoryContextAllocZero 94689 0.5037 postgresSearchSysCache Do you want line numbers or lower percentages? Two runs: tps = 21946.961196 (including connections establishing) tps = 22911.873227 (including connections establishing) For write transactions, that seems pretty respectable. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: > Yeah, I'd just drop -S. Easily done. > Make sure to use -c N -j N with pgbench, or you'll probably not be > able to saturate it. Yeah, that's part of the script I copied from you. > I've also had good luck with wal_writer_delay=20ms, although if > you have synchronous_commit=on that might not matter, and it's > much less important since Simon's recent patch in that area went > in. What the heck; will do. > What scale factor are you testing at? 100. Perhaps I should boost that since I'm going as far as 128 clients? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Fri, Nov 18, 2011 at 12:45 PM, Kevin Grittner wrote: > OK. Sorry for misunderstanding that. I haven't gotten around to a > deep reading of the patch yet. :-( I based this on the test script > you posted here (with slight modifications for my preferred > directory structures): > > http://archives.postgresql.org/pgsql-hackers/2011-10/msg00605.php > > If I just drop the -S switch will I have a good test, or are there > other adjustments I should make (besides increasing checkpoint > segments)? (Well, for the SELECT-only test I didn't bother putting > pg_xlog on a separate RAID 10 on it's own BBU controller as we > normally would for this machine, I'll cover that, too.) Yeah, I'd just drop -S. Make sure to use -c N -j N with pgbench, or you'll probably not be able to saturate it. I've also had good luck with wal_writer_delay=20ms, although if you have synchronous_commit=on that might not matter, and it's much less important since Simon's recent patch in that area went in. What scale factor are you testing at? >> It doesn't make any sense for PostgreSQL master to be using only >> 50% of the CPU and leaving the rest idle on a lots-of-clients >> SELECT-only test. That could easily happen on 9.1, but my lock >> manager changes eliminated the only place where anything gets put >> to sleep in that path (except for the emergency sleeps done by >> s_lock, when a spinlock is really badly contended). So I'm >> confused by these results. Are we sure that the processes are >> being scheduled across all 32 physical cores? > > I think so. My take was that it was showing 32 of 64 *threads* > active -- the hyperthreading funkiness. Is there something in > particular you'd like me to check? Not really, just don't understand the number. >> At any rate, I do think it's likely that you're being bitten by >> spinlock contention, but we'd need to do some legwork to verify >> that and work out the details. Any chance you can run oprofile >> (on either branch, don't really care) against the 32 client test >> and post the results? If it turns out s_lock is at the top of the >> heap, I can put together a patch to help figure out which spinlock >> is the culprit. > > oprofile isn't installed on this machine. I'll take care of that > and post results when I can. OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: > Kevin Grittner wrote: >>> Then again, is this a regular pgbench test or is this >>> SELECT-only? >> >> SELECT-only > > Ah, OK. I would not expect flexlocks to help with that; Pavan's > patch might, though. OK. Sorry for misunderstanding that. I haven't gotten around to a deep reading of the patch yet. :-( I based this on the test script you posted here (with slight modifications for my preferred directory structures): http://archives.postgresql.org/pgsql-hackers/2011-10/msg00605.php If I just drop the -S switch will I have a good test, or are there other adjustments I should make (besides increasing checkpoint segments)? (Well, for the SELECT-only test I didn't bother putting pg_xlog on a separate RAID 10 on it's own BBU controller as we normally would for this machine, I'll cover that, too.) > It doesn't make any sense for PostgreSQL master to be using only > 50% of the CPU and leaving the rest idle on a lots-of-clients > SELECT-only test. That could easily happen on 9.1, but my lock > manager changes eliminated the only place where anything gets put > to sleep in that path (except for the emergency sleeps done by > s_lock, when a spinlock is really badly contended). So I'm > confused by these results. Are we sure that the processes are > being scheduled across all 32 physical cores? I think so. My take was that it was showing 32 of 64 *threads* active -- the hyperthreading funkiness. Is there something in particular you'd like me to check? > At any rate, I do think it's likely that you're being bitten by > spinlock contention, but we'd need to do some legwork to verify > that and work out the details. Any chance you can run oprofile > (on either branch, don't really care) against the 32 client test > and post the results? If it turns out s_lock is at the top of the > heap, I can put together a patch to help figure out which spinlock > is the culprit. oprofile isn't installed on this machine. I'll take care of that and post results when I can. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Fri, Nov 18, 2011 at 12:03 PM, Kevin Grittner wrote: >> Then again, is this a regular pgbench test or is this SELECT-only? > > SELECT-only Ah, OK. I would not expect flexlocks to help with that; Pavan's patch might, though. >> Can you by any chance check top or vmstat during the 32-client >> test and see what percentage you have of user time/system >> time/idle time? > > You didn't say whether you wanted master or flexlock, but it turned > out that any difference was way too far into the noise to show. > They both looked like this: > > procs --memory- ---swap-- -io > r b swpd free buff cache si so bi bo > system -cpu-- > in cs us sy id wa st > 38 0 352 1157400 207177020 52360472 0 0 0 16 > 13345 1190230 40 7 53 0 0 > 37 0 352 1157480 207177020 52360472 0 0 0 0 > 12953 1263310 40 8 52 0 0 > 36 0 352 1157484 207177020 52360472 0 0 0 0 > 13411 1233365 38 7 54 0 0 > 37 0 352 1157476 207177020 52360472 0 0 0 0 > 12780 1193575 41 7 51 0 0 > > Keep in mind that while there are really 32 cores, the cpu > percentages seem to be based on the "threads" from hyperthreading. > Top showed pgbench (running on the same machine) as eating a pretty > steady 5.2 of the cores, leaving 26.8 cores to actually drive the 32 > postgres processes. It doesn't make any sense for PostgreSQL master to be using only 50% of the CPU and leaving the rest idle on a lots-of-clients SELECT-only test. That could easily happen on 9.1, but my lock manager changes eliminated the only place where anything gets put to sleep in that path (except for the emergency sleeps done by s_lock, when a spinlock is really badly contended). So I'm confused by these results. Are we sure that the processes are being scheduled across all 32 physical cores? At any rate, I do think it's likely that you're being bitten by spinlock contention, but we'd need to do some legwork to verify that and work out the details. Any chance you can run oprofile (on either branch, don't really care) against the 32 client test and post the results? If it turns out s_lock is at the top of the heap, I can put together a patch to help figure out which spinlock is the culprit. Anyway, this is probably a digression as it relates to FlexLocks: those are not optimizing for a read-only workload. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
"Kevin Grittner" wrote: > We have a 32-core Intel box (4 x X7560 @ 2.27GHz) with 256 GB > RAM. In case anyone cares, this is the same box for which I posted STREAM test results a while back. The PostgreSQL tests seem to peak on this 32-core box at 64 clients, while the STREAM test of raw RAM speed kept increasing up to 128 clients. Overall, though, it's impressive how close PostgreSQL is now coming to the raw RAM access speed curve. http://archives.postgresql.org/pgsql-hackers/2011-08/msg01306.php -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: > Then again, is this a regular pgbench test or is this SELECT-only? SELECT-only > Can you by any chance check top or vmstat during the 32-client > test and see what percentage you have of user time/system > time/idle time? You didn't say whether you wanted master or flexlock, but it turned out that any difference was way too far into the noise to show. They both looked like this: procs --memory- ---swap-- -io r b swpdfree buffcache si sobibo system -cpu-- in cs us sy id wa st 38 0352 1157400 207177020 5236047200 016 13345 1190230 40 7 53 0 0 37 0352 1157480 207177020 5236047200 0 0 12953 1263310 40 8 52 0 0 36 0352 1157484 207177020 5236047200 0 0 13411 1233365 38 7 54 0 0 37 0352 1157476 207177020 5236047200 0 0 12780 1193575 41 7 51 0 0 Keep in mind that while there are really 32 cores, the cpu percentages seem to be based on the "threads" from hyperthreading. Top showed pgbench (running on the same machine) as eating a pretty steady 5.2 of the cores, leaving 26.8 cores to actually drive the 32 postgres processes. > What OS are you running? Linux new-CIR 2.6.32.43-0.4-default #1 SMP 2011-07-14 14:47:44 +0200 x86_64 x86_64 x86_64 GNU/Linux SUSE Linux Enterprise Server 11 (x86_64) VERSION = 11 PATCHLEVEL = 1 -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Fri, Nov 18, 2011 at 11:26 AM, Kevin Grittner wrote: > Robert Haas wrote: >> Nate Boley's AMD 6128 box (which has 32 cores) and an HP Integrity >> server (also with 32 cores). > >> [clear improvement with flexlock patch] > > Hmm. We have a 32-core Intel box (4 x X7560 @ 2.27GHz) with 256 GB > RAM. It's about a week from going into production, at which point > it will be extremely hard to schedule such tests, but for a few days > more I've got shots at it. The flexlock patch doesn't appear to be > such a clear win here. > > I started from Robert's tests, but used these settings so that I > could go to higher client counts and better test serializable > transactions. Everything is fully cached. > > max_connections = 200 > max_pred_locks_per_transaction = 256 > shared_buffers = 8GB > maintenance_work_mem = 1GB > checkpoint_segments = 30 > checkpoint_timeout = 15min > checkpoint_completion_target = 0.9 > seq_page_cost = 0.1 > random_page_cost = 0.1 > cpu_tuple_cost = 0.05 > effective_cache_size = 40GB > default_transaction_isolation = '$iso' I had a dismaying benchmarking experience recently that involved settings very similar to the ones you've got there - in particular, I also had checkpoint_segments set to 30. When I raised it to 300, performance improved dramatically at 8 clients and above. Then again, is this a regular pgbench test or is this SELECT-only? Because the absolute numbers you're posting are vastly higher than anything I've ever seen on a write test. Can you by any chance check top or vmstat during the 32-client test and see what percentage you have of user time/system time/idle time? What OS are you running? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: > Nate Boley's AMD 6128 box (which has 32 cores) and an HP Integrity > server (also with 32 cores). > [clear improvement with flexlock patch] Hmm. We have a 32-core Intel box (4 x X7560 @ 2.27GHz) with 256 GB RAM. It's about a week from going into production, at which point it will be extremely hard to schedule such tests, but for a few days more I've got shots at it. The flexlock patch doesn't appear to be such a clear win here. I started from Robert's tests, but used these settings so that I could go to higher client counts and better test serializable transactions. Everything is fully cached. max_connections = 200 max_pred_locks_per_transaction = 256 shared_buffers = 8GB maintenance_work_mem = 1GB checkpoint_segments = 30 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 seq_page_cost = 0.1 random_page_cost = 0.1 cpu_tuple_cost = 0.05 effective_cache_size = 40GB default_transaction_isolation = '$iso' Serializable results not shown here -- that's to gather information for trying to improve SSI locking. m1 tps = 7847.834544 (including connections establishing) f1 tps = 7917.225382 (including connections establishing) m2 tps = 18672.145526 (including connections establishing) f2 tps = 17486.435322 (including connections establishing) m4 tps = 34371.278253 (including connections establishing) f4 tps = 34465.898173 (including connections establishing) m8 tps = 68228.261694 (including connections establishing) f8 tps = 68505.285830 (including connections establishing) m16 tps = 127449.815100 (including connections establishing) f16 tps = 127208.939670 (including connections establishing) m32 tps = 201738.209348 (including connections establishing) f32 tps = 201637.237903 (including connections establishing) m64 tps = 380326.800557 (including connections establishing) f64 tps = 380628.429408 (including connections establishing) m80 tps = 366628.197546 (including connections establishing) f80 tps = 162594.012051 (including connections establishing) m96 tps = 360922.948775 (including connections establishing) f96 tps = 366728.987041 (including connections establishing) m128 tps = 352159.631878 (including connections establishing) f128 tps = 355475.129448 (including connections establishing) I did five runs each and took the median. In most cases, the values were pretty close to one another in a group, so confidence is pretty high that this is meaningful. There were a few anomalies where performance for one or more samples was horrid. This seems consistent with the theory of pathological pileups on the LW locks (or also flexlocks?). The problem groups: m64 tps = 380407.768906 (including connections establishing) m64 tps = 79197.470389 (including connections establishing) m64 tps = 381112.194105 (including connections establishing) m64 tps = 378579.036542 (including connections establishing) m64 tps = 380326.800557 (including connections establishing) m96 tps = 360582.945291 (including connections establishing) m96 tps = 363021.805138 (including connections establishing) m96 tps = 362468.870516 (including connections establishing) m96 tps = 59614.322351 (including connections establishing) m96 tps = 360922.948775 (including connections establishing) f80 tps = 158905.149822 (including connections establishing) f80 tps = 157192.460599 (including connections establishing) f80 tps = 370757.790443 (including connections establishing) f80 tps = 162594.012051 (including connections establishing) f80 tps = 372170.638516 (including connections establishing) f96 tps = 366804.733788 (including connections establishing) f96 tps = 366728.987041 (including connections establishing) f96 tps = 365490.380848 (including connections establishing) f96 tps = 366770.193305 (including connections establishing) f96 tps = 125225.371140 (including connections establishing) So the lows don't seem to be as low when they happen with the flexlock patch, but they still happen -- possibly more often? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers