Re: [HACKERS] testing ProcArrayLock patches
On Saturday, November 19, 2011 12:18:07 AM Kevin Grittner wrote: > Andres Freund wrote: > > I think opannotate -a -s produces output with instructions/code > > intermingled. > > Thanks. I'll check out perf later (thanks for the tips!), but for > now, here's the function which was at the top of my oprofile > results, annotated with those options. I'm afraid it's a bit > intimidating to me -- the last time I did much with X86 assembly > language was in the mid-80s, on an 80286. :-/ While my assembly knoweldge surely isn't from the 80s be assured that I find it intimidating as well ;) > Hopefully, since > this is at the top of the oprofile results when running with > prepared statements, it will be of use to somebody. I think in quite many situations hash_search_with_hash_value is rather noticeable in the profiles. Even without concurrency... Looking at your annotation output the code seems to be almost entirely stalled waiting for memory. The first stall is after the first reading memory access which is likely to be uncached (the first cacheline of the HTAB is accessed before but that will be in the cache). The interesting thing is that I would have expected a higher likelihood for this to stay in the cache. 2225 0.0165 : 70b543: mov(%rdi),%r15 :static inline uint32 :calc_bucket(HASHHDR *hctl, uint32 hash_val) :{ :uint32 bucket; : :bucket = hash_val & hctl->high_mask; 4544 0.0337 : 70b546: and0x2c(%r15),%ebx :if (bucket > hctl->max_bucket) 53409 0.3958 : 70b54a: cmp0x28(%r15),%ebx : 70b54e: jbe70b554 So a stall here is not that surprising. Here we fetch data from memory which is unlikely to be prefetchable and then require the result from that fetch. Note how segp = hashp->dir[segment_num]; is distributed over line 52, 64, 83. :segp = hashp->dir[segment_num]; 2062 0.0153 : 70b562: shr%cl,%eax 309 0.0023 : 70b564: mov%eax,%eax 643 0.0048 : 70b566: mov(%rdx,%rax,8),%rbp : :if (segp == NULL) 43329 0.3211 : 70b56a: test %rbp,%rbp The next cacheline is referenced here. Again a fetch from memory which is soon after needed to continue. Unless I misunderstood the code-flow this disproves my theory that we might have many collisions as that test seems to be outside the test ( :prevBucketPtr = &segp[segment_ndx]; :currBucket = *prevBucketPtr; 122 9.0e-04 : 70b586: mov0x0(%rbp),%rbx : :/* : * Follow collision chain looking for matching key : */ :match = hashp->match; /* save one fetch in inner loop */ :keysize = hashp->keysize; /* ditto */ 99903 0.7404 : 70b58a: mov%rax,0x18(%rsp) : :while (currBucket != NULL) 1066 0.0079 : 70b58f: test %rbx,%rbx line 136 is the first time the contents of the current bucket is needed. Thats why the test is so noticeable. :currBucket = *prevBucketPtr; 655 0.0049 : 70b5a3: mov(%rbx),%rbx : * Follow collision chain looking for matching key : */ :match = hashp->match; /* save one fetch in inner loop */ :keysize = hashp->keysize; /* ditto */ : :while (currBucket != NULL) 608 0.0045 : 70b5a6: test %rbx,%rbx : 70b5a9: je 70b5d0 :{ :if (currBucket->hashvalue == hashvalue && 3504 0.0260 : 70b5ab: cmp%r12d,0x8(%rbx) 98486 0.7299 : 70b5af: nop 1233 0.0091 : 70b5b0: jne70b5a0 That covers all the slow points in the function. And unless I am missing something those are all the fetched cachelines of that function... For HASH_FIND that is. So I think that reinforces my belive that ordinary cachemisses are the culprit here. Which is to be excepted in a hashtable... Andres PS: No idea whether that rambling made sense to anyone... But I looked at that function fo the first time ;) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
On Fri, Nov 18, 2011 at 4:38 PM, Peter Geoghegan wrote: > I understand that we highly value extensibility and genericity (yes, > that's a real word). We may not always be well served by that > tendency. True (except that genericity is not a synonym for generality AFAICT). A good fraction of the optimizations that we've done over the last, well, pick any arbitrary period of time consists in adding special cases that occur frequently enough to be worth optimizing - e.g. HOT, or fast relation locks - often to extremely good effect. > Firstly, 1/4 of a quite large gain is still a pretty good gain. > Secondly, I probably didn't actually isolate the effects of inlining, > nor the overall benefit of the compiler knowing the comparator at > compile-time. I just removed the inline keyword. Maybe we should look at trying to isolate that a bit better. It strikes me that we could probably create an API that would support doing either of these things depending on the wishes of the underlying datatype. For example, imagine that we're sorting with <(int4, int4). We associate a PGPROC-callable function with that operator that returns "internal", really a pointer to a struct. The first element of the struct is a pointer to a comparison function that qsort() (or a tape sort) can invoke without a trampoline; the second is a wholesale replacement for qsort(); either or both can be NULL. Given that, it seems to me that we could experiment with this pretty easily, and if it turns out that only one of them is worth doing, it's easy to drop one element out of the structure. Or do you have another plan for how to do this? > Fair enough, but it's not the only test I did - I posted other numbers > for the same query when the table was 48mb, and we saw a proportional > improvement, consistent with a per-comparison win. I'm supposed to be > on leave for a few days at the moment, so I won't be very active this > weekend, but I'm rather curious as to where you or others would like > to see me go with benchmarks. > > I should point out that we currently don't have much idea how big of a > win applying these principles could be for index creation times...it > could possibly be very significant. My profiling of index creation > makes this looks promising. Have you done any benchmarks where this saves seconds or minutes, rather than milliseconds? That would certainly make it more exciting, at least to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN (plan off, rewrite off) for benchmarking
On Saturday, November 19, 2011 12:16:18 AM Tom Lane wrote: > Andres Freund writes: > > Hi, > > For benchmarking the parser I added the above options (dim suggested this > > on irc) which proved to be rather useful for me. > > What exactly is EXPLAIN printing, if you've not done planning? Nothing very interesting: postgres=# EXPLAIN (rewrite off) SELECT 1, 'test', "pg_class"."oid",relname, relkind FROM pg_class WHERE oid = 1000; QUERY PLAN -- not rewriting query because auf !rewrite (1 row) Explain is just a vehicle here, I admit that. But on what else should I bolt it? The only thing I could think of otherwise would be to do the parsing via from a C func. But to simmulate a real scenario there would require too much bootstrapping for my taste. > Also, I > believe the planner depends on the assumption that the rewriter has done > its work, so these seem to amount to EXPLAIN (break_it). "rewrite off" currently simply aborts before doing the rewriting and copyObject(). copyObject is the expensive part there for simple queries. rewriting happened to be the functional part where I wanted to stop - because of the overhead of copyObject - instead of a goal in itself. Btw, optimizing copyObject() memory usage wise would be another big performance gain. But even murkier than the stuff over in the lists thread... > If you just want to benchmark parsing, perhaps CREATE RULE would be a > useful environment. I don't really see how one can use that to benchmark parsing. CREATE OR REPLACE VIEW is - not unexpectedly - far slower than EXPLAIN (rewrite off) or EXPLAIN (plan off). for the statement: SELECT * FROM pg_class WHERE oid = 1000; rewrite off: tps = 16086.694353 plan off, no copyObject() tps = 15663.280093 plan off: tps = 13471.272551 explain: tps = 6162.161946 explain analyze: tps = 5744.172839 normal execution: tps = 6991.398740 CREATE OR REPLACE VIEW (after changing the log level): tps = 2550.246625 Greetings, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Fri, Nov 18, 2011 at 6:46 PM, Kevin Grittner wrote: >>> tps = 21946.961196 (including connections establishing) >>> tps = 22911.873227 (including connections establishing) >>> >>> For write transactions, that seems pretty respectable. >> >> Very. What do you get without the patch? > > [quick runs a couple tests that way] > > Single run with -M simple: > > tps = 23018.314292 (including connections establishing) > > Single run with -M prepared: > > tps = 27910.621044 (including connections establishing) > > So, the patch appears to hinder performance in this environment, > although certainty is quite low with so few samples. I'll schedule > a spectrum of runs before I leave this evening (very soon). Hmm. There's obviously something that's different in your environment or configuration from what I tested, but I don't know what it is. The fact that your scale factor is larger than shared_buffers might matter; or Intel vs. AMD. Or maybe you're running with synchronous_commit=on? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: > Hmm. That looks a lot like a profile with no lock contention at > all. Since I see XLogInsert in there, I assume this must be a > pgbench write test on unlogged tables? How close am I? Not unless pgbench on HEAD does that by default. Here are the relevant statements: $prefix/bin/pgbench -i -s 150 $prefix/bin/pgbench -T $time -c $clients -j $clients >>$resultfile Perhaps the Intel cores implement the relevant primitives better? Maybe I didn't run the profile or reports the right way? > I was actually thinking it would be interesting to oprofile the > read-only test; see if we can figure out where those slowdowns are > coming from. I'll plan on doing that this weekend. >> tps = 21946.961196 (including connections establishing) >> tps = 22911.873227 (including connections establishing) >> >> For write transactions, that seems pretty respectable. > > Very. What do you get without the patch? [quick runs a couple tests that way] Single run with -M simple: tps = 23018.314292 (including connections establishing) Single run with -M prepared: tps = 27910.621044 (including connections establishing) So, the patch appears to hinder performance in this environment, although certainty is quite low with so few samples. I'll schedule a spectrum of runs before I leave this evening (very soon). -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Fri, Nov 18, 2011 at 2:05 PM, Kevin Grittner wrote: > Robert Haas wrote: >> Any chance you can run oprofile (on either branch, don't really >> care) against the 32 client test and post the results? > > [ oprofile results ] Hmm. That looks a lot like a profile with no lock contention at all. Since I see XLogInsert in there, I assume this must be a pgbench write test on unlogged tables? How close am I? I was actually thinking it would be interesting to oprofile the read-only test; see if we can figure out where those slowdowns are coming from. > Two runs: > > tps = 21946.961196 (including connections establishing) > tps = 22911.873227 (including connections establishing) > > For write transactions, that seems pretty respectable. Very. What do you get without the patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Andres Freund wrote: > I think opannotate -a -s produces output with instructions/code > intermingled. Thanks. I'll check out perf later (thanks for the tips!), but for now, here's the function which was at the top of my oprofile results, annotated with those options. I'm afraid it's a bit intimidating to me -- the last time I did much with X86 assembly language was in the mid-80s, on an 80286. :-/ Hopefully, since this is at the top of the oprofile results when running with prepared statements, it will be of use to somebody. The instructions which are shown as having that 1% still seem odd to me, but as you say, they were probably actually waiting for some previous operation to finish: 43329 0.3211 : 70b56a: test %rbp,%rbp 99903 0.7404 : 70b58a: mov%rax,0x18(%rsp) If anyone wants any other detail from what I captured, let me know. -Kevin 0070b520 : /* hash_search_with_hash_value total: 495463 3.6718 */ :hash_search_with_hash_value(HTAB *hashp, :const void *keyPtr, :uint32 hashvalue, : HASHACTION action, :bool *foundPtr) :{ 5023 0.0372 : 70b520: push %r15 5967 0.0442 : 70b522: push %r14 1407 0.0104 : 70b524: mov%rdi,%r14 30 2.2e-04 : 70b527: push %r13 2495 0.0185 : 70b529: push %r12 2631 0.0195 : 70b52b: mov%edx,%r12d 18 1.3e-04 : 70b52e: push %rbp 1277 0.0095 : 70b52f: push %rbx :static inline uint32 :calc_bucket(HASHHDR *hctl, uint32 hash_val) :{ :uint32 bucket; : :bucket = hash_val & hctl->high_mask; 2122 0.0157 : 70b530: mov%edx,%ebx :hash_search_with_hash_value(HTAB *hashp, :const void *keyPtr, :uint32 hashvalue, : HASHACTION action, :bool *foundPtr) :{ 247 0.0018 : 70b532: sub$0x58,%rsp 236 0.0017 : 70b536: mov%rsi,0x10(%rsp) 3851 0.0285 : 70b53b: mov%ecx,0xc(%rsp) 2551 0.0189 : 70b53f: mov%r8,(%rsp) :HASHHDR*hctl = hashp->hctl; 2225 0.0165 : 70b543: mov(%rdi),%r15 :static inline uint32 :calc_bucket(HASHHDR *hctl, uint32 hash_val) :{ :uint32 bucket; : :bucket = hash_val & hctl->high_mask; 4544 0.0337 : 70b546: and0x2c(%r15),%ebx :if (bucket > hctl->max_bucket) 53409 0.3958 : 70b54a: cmp0x28(%r15),%ebx : 70b54e: jbe70b554 :bucket = bucket & hctl->low_mask; 3324 0.0246 : 70b550: and0x30(%r15),%ebx :bucket = calc_bucket(hctl, hashvalue); : :segment_num = bucket >> hashp->sshift; :segment_ndx = MOD(bucket, hashp->ssize); : :segp = hashp->dir[segment_num]; 9702 0.0719 : 70b554: mov0x58(%r14),%ecx 2428 0.0180 : 70b558: mov%ebx,%eax 489 0.0036 : 70b55a: mov0x8(%r14),%rdx : * Do the initial lookup : */ :bucket = calc_bucket(hctl, hashvalue); : :segment_num = bucket >> hashp->sshift; :segment_ndx = MOD(bucket, hashp->ssize); 391 0.0029 : 70b55e: mov0x50(%r14),%r13 : :segp = hashp->dir[segment_num]; 2062 0.0153 : 70b562: shr%cl,%eax 309 0.0023 : 70b564: mov%eax,%eax 643 0.0048 : 70b566: mov(%rdx,%rax,8),%rbp : :if (segp == NULL) 43329 0.3211 : 70b56a: test %rbp,%rbp 1284 0.0095 : 70b56d: je 70b727 :hash_corrupted(hashp); : :prevBucketPtr = &segp[segment_ndx]; 1878 0.0139 : 70b573: lea-0x1(%r13),%rax :currBucket = *prevBucketPtr; : :/* : * Follow collision chain looking for matching key : */ :match = hashp->match;
Re: [HACKERS] EXPLAIN (plan off, rewrite off) for benchmarking
Andres Freund writes: > Hi, > For benchmarking the parser I added the above options (dim suggested this on > irc) which proved to be rather useful for me. What exactly is EXPLAIN printing, if you've not done planning? Also, I believe the planner depends on the assumption that the rewriter has done its work, so these seem to amount to EXPLAIN (break_it). If you just want to benchmark parsing, perhaps CREATE RULE would be a useful environment. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should a materialized view be based on a view?
On 18 November 2011 22:26, Kevin Grittner wrote: > I still have a lot of reading to do before I propose anything > concrete for development, but one thing that has already struck me > as a common theme for MVs is that a lot of people seem to like the > idea of first creating a "normal" view, and then materializing it. > That seems pretty attractive to me, too. How do people feel about > that as a fundamental design decision: that a MV would always have > a corresponding view (under a different name or in a different > schema)? Love it or hate it? Is there a need to create it as a normal view first? Can't the CREATE VIEW syntax be expanded to support MV capabilities? (CREATE [ MATERIALIZED ] VIEW...) And then ALTER VIEW can materialise a regular view, or dematerialise a materialised view. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should a materialized view be based on a view?
On 18 November 2011 23:26, Kevin Grittner wrote: > I still have a lot of reading to do before I propose anything > concrete for development, but one thing that has already struck me > as a common theme for MVs is that a lot of people seem to like the > idea of first creating a "normal" view, and then materializing it. > That seems pretty attractive to me, too. How do people feel about > that as a fundamental design decision: that a MV would always have > a corresponding view (under a different name or in a different > schema)? Love it or hate it? > > -Kevin > > Hi Kevin, maybe a stupid question... but why? It looks like for creating a function I should create another function earlier. For me the design should be simple. If you want to create something below my MV, thats fine for me, if I don't need to know that (just like when creating a serial column). regards Szymon
Re: [HACKERS] testing ProcArrayLock patches
On Friday, November 18, 2011 11:12:02 PM Andres Freund wrote: > On Friday, November 18, 2011 09:16:01 PM Kevin Grittner wrote: > > Andres Freund wrote: > > > When doing line-level profiles I would suggest looking at the > > > instructions. > > > > What's the best way to do that? > > I think opannotate -a -s produces output with instructions/code > intermingled. > > > > I don't think cache line contention is the most likely candidate > > > here. Simple cache-misses seem far more likely. In combination > > > with pipeline stalls... > > > > > > Newer cpus (nehalem+) can measure stalled cycles which can be > > > really useful when analyzing performance. I don't remember how to > > > do that with oprofile right now though as I use perf these days > > > (its -e stalled-cycles{frontend|backend} there}). > > > > When I run oprofile, I still always go back to this post by Tom: > > http://archives.postgresql.org/pgsql-performance/2009-06/msg00154.php > > Hrm. I am on the train and for unknown reasons the only sensible working > protocols are smtp + pop Waiting Waiting > Sorry, too slow/high latency atm. I wrote everything below and another mail > and the page still hasn't loaded. > > oprofile can produces graphes as well (--callgraph). for both tools you > need -fno-omit-frame-pointers to get usable graphs. > > > Can anyone provide such a "cheat sheet" for perf? I could give that > > a try if I knew how. > > Unfortunately for sensible results the kernel needs to be rather new. > I would say > 2.6.28 or so (just guessed). > > # to record activity > perf record [-g|--call-graph] program|-p pid > > # to view a summation > perf report > # get heaps of stats from something > perf stat -ddd someprogram|-p pid > # show whats the system executing overall > perf top -az > > # get help > perf help (record|report|annotate|stat|...) > ... I forgot that there is also # get a list of event types perf list # measure somethign for a specidif event perf (record|stat|top) -e some_event_type Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Friday, November 18, 2011 09:16:01 PM Kevin Grittner wrote: > Andres Freund wrote: > > When doing line-level profiles I would suggest looking at the > > instructions. > What's the best way to do that? I think opannotate -a -s produces output with instructions/code intermingled. > > I don't think cache line contention is the most likely candidate > > here. Simple cache-misses seem far more likely. In combination > > with pipeline stalls... > > > > Newer cpus (nehalem+) can measure stalled cycles which can be > > really useful when analyzing performance. I don't remember how to > > do that with oprofile right now though as I use perf these days > > (its -e stalled-cycles{frontend|backend} there}). > > When I run oprofile, I still always go back to this post by Tom: > http://archives.postgresql.org/pgsql-performance/2009-06/msg00154.php Hrm. I am on the train and for unknown reasons the only sensible working protocols are smtp + pop Waiting Waiting Sorry, too slow/high latency atm. I wrote everything below and another mail and the page still hasn't loaded. oprofile can produces graphes as well (--callgraph). for both tools you need -fno-omit-frame-pointers to get usable graphs. > Can anyone provide such a "cheat sheet" for perf? I could give that > a try if I knew how. Unfortunately for sensible results the kernel needs to be rather new. I would say > 2.6.28 or so (just guessed). # to record activity perf record [-g|--call-graph] program|-p pid # to view a summation perf report graph: # Overhead Command Shared Object Symbol # . . # 4.09% postgres postgres [.] slab_alloc_dyn | --- slab_alloc_dyn | |--18.52%-- new_list | | | |--63.79%-- lappend | | | | | |--13.40%-- find_usable_indexes | | | create_index_paths | | | set_rel_pathlist | | | make_one_rel flat: # Overhead Command Shared Object Symbol # . . # 5.10% postgres [vdso] [.] 0x73d8d770 4.26% postgres postgres [.] base_yyparse 3.88% postgres postgres [.] slab_alloc_dyn 2.82% postgres postgres [.] core_yylex 2.37% postgres postgres [.] SearchCatCache 1.85% postgres libc-2.13.so [.] __memcpy_ssse3 1.66% postgres libc-2.13.so [.] __GI___strcmp_ssse3 1.23% postgres postgres [.] MemoryContextAlloc # to view a line/source/instruction level view perf annotate -l symbol ... : : /* : * one-time startup overhead for each cache : */ : if (cache->cc_tupdesc == NULL) 0.35 :6e81fd: 48 83 7f 28 00 cmpq $0x0,0x28(%rdi) /home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1070 4.15 :6e8202: 0f 84 54 04 00 00 je 6e865c : #endif : : /* : * initialize the search key information : */ : memcpy(cur_skey, cache->cc_skey, sizeof(cur_skey)); 0.00 :6e8208: 48 8d bd a0 fe ff fflea-0x160(%rbp),%rdi 0.17 :6e820f: 49 8d 77 70 lea0x70(%r15),%rsi 0.00 :6e8213: b9 24 00 00 00 mov$0x24,%ecx /home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1080 33.22 :6e8218: f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi) : cur_skey[0].sk_argument = v1; /home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1081 1.56 :6e821b: 48 89 9d e0 fe ff ffmov%rbx,-0x120(%rbp) ... # get heaps of stats from something perf stat -ddd someprogram|-p pid 1242.409965 task-clock#0.824 CPUs utilized [100.00%] 14,572 context-switches #0.012 M/sec [100.00%] 264 CPU-migrations#0.000 M/sec [100.00%] 0 page-faults #0.000 M/sec 2,854,775,135 cycles#2.298 GHz [26.28%] stalled-cycles-frontend stalled-cycles-backend 2,024,997,785 instructions #0.71 insns per cycle [25.25%] 387,240,903 bran
[HACKERS] Should a materialized view be based on a view?
I still have a lot of reading to do before I propose anything concrete for development, but one thing that has already struck me as a common theme for MVs is that a lot of people seem to like the idea of first creating a "normal" view, and then materializing it. That seems pretty attractive to me, too. How do people feel about that as a fundamental design decision: that a MV would always have a corresponding view (under a different name or in a different schema)? Love it or hate it? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
On Fri, Nov 18, 2011 at 12:20:26AM -0500, Robert Haas wrote: > I think that we should really consider doing with this patch what Tom > suggested upthread; namely, looking for a mechanism to allow > individual datatypes to offer up a comparator function that doesn't > require bouncing through FunctionCall2Coll(). It seems to me that > duplicating the entire qsort() algorithm is a little iffy. Sure, in > this case it works out to a win. But it's only a small win - > three-quarters of it is in the uncontroversial activity of reducing > the impedance mismatch - and how do we know it will always be a win? There's always the old idea of a data type providing a function mapping f:(type -> int64) in such a way that it preserves order. That is, in the sense that: f(x) < f(y) => x < y When sorting, you add f(x) as hidden column and change "ORDER BY x" to "ORDER BY f(x), x". Then you only need to special case the int64 version. This would mean that in most cases you may be able to skip the call because you're comparing integers. The downside is you need to call f on each input. It depends on the datatype if that's cheaper or not, but for all numerics types I think it's an easy win. I don't think anyone has written a proof of concept for this. It does have the advantage of scaling better than coding a qsort for each individual type. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Materialized views
"Kevin Grittner" wrote: > I'm considering submitting a proposal to management that I be > assigned to work on a declarative implementation in PostgreSQL to > allow speedier application development of software needing > materialized views. Thanks to all who provided feedback and support in response to my post. Based on the feedback here and off-list, I did submit a proposal. It was just approved by the appropriate steering committee (consisting of our CIO, the Director of State Courts, District Court Administrators, Judges, Clerks of Court, and other stake-holders) as a low-priority project. That means that I expect I'll have the time to get a patch together in time for 9.3, but the times at which the decks will be clear of other assignments to allow work on this will not be very predictable. I'll probably be on-again, off-again throughout the year. I apologize in advance for the fact that the times when I will be able to work on it might not fit well with the release cycle or CFs, but I kinda have to take what I can get in that regard. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Replace a long chain of if's in eval_const_expressions_mutator by a switch()
On Friday, November 18, 2011 10:14:22 PM Tom Lane wrote: > Andres Freund writes: > > For unknown reasons the function used non chained ifs for every handled > > nodeType. > > Replacing the if chain with if; else if; ... resulted in a small > > speedup. Replacing it with a switch() in a bigger one. > > Cool, but this patch is impossible to validate by eye. Could you > resubmit a version that doesn't reindent unchanged code? Leave it > for pgindent to clean that up later. Sure. It was just to confusing reading the code without reindenting. Btw, I found git diff/show/blame -w very useful to view reindent code. Actually git show -w seems to produce an applyable patch which doesn't reindent... Andres commit e114e3a2708cab8efd64281d09cecbd6303aa329 Author: Andres Freund Date: Fri Nov 11 23:32:02 2011 +0100 Replace a long chain of if's in eval_const_expressions_mutator by a switch() For unknown reasons the function used non chained ifs for every handled nodeType. Replacing the if chain with if; else if; ... resulted in a small speedup. Replacing it with a switch() in a bigger one. When testing with a statement containing a longer VALUES statement: pgbench -M prepared -f stmt -T 10: orig: 10015.287829 if: 10075.482117 switch: 10246.527402 diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index 197d9c2..f6f0b11 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -2106,7 +2106,9 @@ eval_const_expressions_mutator(Node *node, { if (node == NULL) return NULL; - if (IsA(node, Param)) + switch(nodeTag(node)) + { + case T_Param: { Param *param = (Param *) node; @@ -2152,7 +2154,7 @@ eval_const_expressions_mutator(Node *node, /* Not replaceable, so just copy the Param (no need to recurse) */ return (Node *) copyObject(param); } - if (IsA(node, FuncExpr)) + case T_FuncExpr: { FuncExpr *expr = (FuncExpr *) node; List *args; @@ -2210,7 +2212,7 @@ eval_const_expressions_mutator(Node *node, newexpr->location = expr->location; return (Node *) newexpr; } - if (IsA(node, OpExpr)) + case T_OpExpr: { OpExpr *expr = (OpExpr *) node; List *args; @@ -2275,7 +2277,7 @@ eval_const_expressions_mutator(Node *node, newexpr->location = expr->location; return (Node *) newexpr; } - if (IsA(node, DistinctExpr)) + case T_DistinctExpr: { DistinctExpr *expr = (DistinctExpr *) node; List *args; @@ -2372,7 +2374,7 @@ eval_const_expressions_mutator(Node *node, newexpr->location = expr->location; return (Node *) newexpr; } - if (IsA(node, BoolExpr)) + case T_BoolExpr: { BoolExpr *expr = (BoolExpr *) node; @@ -2440,9 +2442,8 @@ eval_const_expressions_mutator(Node *node, break; } } - if (IsA(node, SubPlan) || - IsA(node, AlternativeSubPlan)) - { + case T_SubPlan: + case T_AlternativeSubPlan: /* * Return a SubPlan unchanged --- too late to do anything with it. * @@ -2450,8 +2451,7 @@ eval_const_expressions_mutator(Node *node, * never be invoked after SubPlan creation. */ return node; - } - if (IsA(node, RelabelType)) + case T_RelabelType: { /* * If we can simplify the input to a constant, then we don't need the @@ -2493,7 +2493,7 @@ eval_const_expressions_mutator(Node *node, return (Node *) newrelabel; } } - if (IsA(node, CoerceViaIO)) + case T_CoerceViaIO: { CoerceViaIO *expr = (CoerceViaIO *) node; Expr *arg; @@ -2569,7 +2569,7 @@ eval_const_expressions_mutator(Node *node, newexpr->location = expr->location; return (Node *) newexpr; } - if (IsA(node, ArrayCoerceExpr)) + case T_ArrayCoerceExpr: { ArrayCoerceExpr *expr = (ArrayCoerceExpr *) node; Expr *arg; @@ -2607,7 +2607,7 @@ eval_const_expressions_mutator(Node *node, /* Else we must return the partially-simplified node */ return (Node *) newexpr; } - if (IsA(node, CollateExpr)) + case T_CollateExpr: { /* * If we can simplify the input to a constant, then we don't need the @@ -2652,7 +2652,7 @@ eval_const_expressions_mutator(Node *node, return (Node *) relabel; } } - if (IsA(node, CaseExpr)) + case T_CaseExpr: { /*-- * CASE expressions can be simplified if there are constant @@ -2783,7 +2783,7 @@ eval_const_expressions_mutator(Node *node, newcase->location = caseexpr->location; return (Node *) newcase; } - if (IsA(node, CaseTestExpr)) + case T_CaseTestExpr: { /* * If we know a constant test value for the current CASE construct, @@ -2795,7 +2795,7 @@ eval_const_expressions_mutator(Node *node, else return copyObject(node); } - if (IsA(node, ArrayExpr)) + case T_ArrayExpr: { ArrayExpr *arrayexpr = (ArrayExpr *) node; ArrayExpr *newarray; @@ -2831,7 +2831,7 @@ eval_const_expressions_mutator(Node *node, return (Node *) newarray; } - if (IsA(node, CoalesceExpr)) + case T_CoalesceExpr: {
Re: [HACKERS] RFC: list API / memory allocations
On Friday, November 18, 2011 10:11:29 PM Tom Lane wrote: > Andres Freund writes: > > In many scenarios memory allocation is one of the top 3 functions showing > > up in profiles. Looking at hierarchical profiles > > (-fno-omit-frame-pointer) at least during parsing, planning and executor > > startup most of that is spent around the list API. > > > > Many - especially in the parse-analyze phase - of those allocations can > > be avoided because the lists are immutable and their size is known > > upfront. > > The fundamental problem with all of those proposals is that now you have > some lists in the system that aren't like other lists, and will result > in dumping core if the wrong sorts of operations are applied to them. > I don't particularly care for introducing that kind of fragility into > the system in return for marginal speed gains. I'm not impressed by > Asserts showing that no such thing happens in the cases you tested; > the test coverage won't be complete, and even if it is, innocent-looking > code changes later on could create new problems. Yes. I dislike that myself (as noted). It seems rather fragile although I think at least during parsing we could simply generally refrain from parsing I don't think that the gains are marginal though. After covering only a small number of cases there are not uncommon/artificial cases gaining more than 20% parsing speed. One prime example of workloads benefiting hugely is something like SELECT * FROM pg_class WHERE oid = ...; Essentially all queries which request few rows with a large number of columns benefit rather measurably. > Now, if you could do something that *doesn't* restrict what operations > could be applied to the lists, that would be good. If every list cell/header would grow another field "allocate_only" (which I currently added for cassert only) those could just get skipped when deleting. Some places are directly freeing list headers but that seems to be a bad idea anyway. The disadvantage is that those would essentially be there till context reset unless done via some special api. Also not that nice... On the other hand only very few callsites free list(-cells) during parsing anyway. Without looking I didn't see measurable increase in memory usage due to the new field with that approach. The only way out of that seems to be to add refcounted list cells/headers :(. I fear that would be rather complex, expensive and failure prone so I don't like to go there. > I've wished for a long while that we could allocate the list header and > the first list cell in a single palloc cycle. Yea. Although that is only a rather small portion of the problems/allocations. > This would basically > require getting list_delete_cell to refrain from pfree'ing a cell that > got allocated that way, which is easy as long as you have the list > header at hand, but what happens if the list is later concat'd to > another? A subsequent delete operation would be referring to the other > list header and would come to the wrong conclusion. I don't think any such scheme is safe. > While thinking about this just now, it occurred to me that maybe the > issues could be dodged if the cell, not the header, were first in the > combined palloc block. list_concat is then no longer a problem, as long > as it doesn't try to throw away the second list's header. But I haven't > thought long enough to be sure how well that would work. I don't think that would work without carefully revising list usage all around... Several places remove nodes from a list and then do list_free() on the remainder. Something aside: For my POC memory allocator I added "intrusive" lists which have the next, prev elements embedded in the stored element. I wonder if some of the list usage could be replaced by such a scheme. Obviously for every embeded list_node a Node can only be in one list... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
On 18 November 2011 05:20, Robert Haas wrote: > I think that we should really consider doing with this patch what Tom > suggested upthread; namely, looking for a mechanism to allow > individual datatypes to offer up a comparator function that doesn't > require bouncing through FunctionCall2Coll(). It seems to me that > duplicating the entire qsort() algorithm is a little iffy. I understand that we highly value extensibility and genericity (yes, that's a real word). We may not always be well served by that tendency. > Sure, in this case it works out to a win. But it's only a small win - > three-quarters of it is in the uncontroversial activity of reducing > the impedance mismatch - and how do we know it will always be a win? Firstly, 1/4 of a quite large gain is still a pretty good gain. Secondly, I probably didn't actually isolate the effects of inlining, nor the overall benefit of the compiler knowing the comparator at compile-time. I just removed the inline keyword. Those are two different things. The inline keyword serves as a request to the compiler to inline. The compiler can and often will ignore that request. Most people know that. What isn't so widely known is that modern compilers may equally well inline even when they haven't been asked to (but only when they can). When you also consider, as I've already pointed out several times, that individual call sites are inlined, it becomes apparent that there may well still be a certain amount of inlining and/or other optimisations like procedural integration going on at some call sites even without the encouragement of the inline keyword, that would not have been performed without the benefit of compile-time comparator knowledge. The addition of the inline keyword may just, in this particular case, have the compiler inline even more call sites. Posting the ~8ms difference was motivated by a desire to prove that inlining had *some* role to play, without actually going to the trouble of implementing Tom's idea as a basis of comparison, because Tom was very sceptical of inlining. The long and the short of it is that I'm going to have to get my hands dirty with a dissembler before we really know exactly what's happening. That, or I could use an optimisation fence of some type. > Adding more copies of the same code can be an anti-optimization if it > means that a smaller percentage of it fits in the instruction cache, > and sometimes small changes in runtime are caused by random shifts in > the layout of memory that align things more or less favorably across > cache lines rather than by real effects. Now it may well be that this > is a real effect, but will it still look as good when we do this for > 10 data types? For 100 data types? I'd favour limiting it to just the common integer and float types. > In contrast, it seems to me that reducing the impedance mismatch is > something that we could go and do across our entire code base, and > every single data type would benefit from it. It would also be > potentially usable by other sorting algorithms, not just quick sort. Suppose that we went ahead and added that infrastructure. What you must acknowledge is that one reason that this speed-up is so dramatic is that the essential expense of a comparison is already so low - a single instruction - and therefore the overall per-comparison cost goes way down, particularly if the qsort inner loop can store the code across fewer cache lines. For that reason, any absolute improvement that you'll see in complex datatypes will be smaller, maybe much smaller, because for each comparison we'll execute many more instructions that are essential to the comparison. In my estimation, all of this work does not point to there being an undue overhead in the function calling convention as you suggested. Still, I'm not opposed to investigating generalising this in some way, reservations notwithstanding, unless we have to block-wait on it. I don't want to chase diminishing returns too far. > Well, that's kind of my point. I think this needs more work before we > decide what the best approach is. Agreed. > So far, the ONLY test result we > have that compares inlining to not inlining shows a speedup from 60 ms > to 52 ms. I think that an 8 ms speedup on one test with one datatype > on one platform/compiler combination isn't sufficient evidence to > conclude that this is the best possible approach. Fair enough, but it's not the only test I did - I posted other numbers for the same query when the table was 48mb, and we saw a proportional improvement, consistent with a per-comparison win. I'm supposed to be on leave for a few days at the moment, so I won't be very active this weekend, but I'm rather curious as to where you or others would like to see me go with benchmarks. I should point out that we currently don't have much idea how big of a win applying these principles could be for index creation times...it could possibly be very significant. My profiling of index
Re: [HACKERS] [PATCH] Replace a long chain of if's in eval_const_expressions_mutator by a switch()
Andres Freund writes: > For unknown reasons the function used non chained ifs for every handled > nodeType. > Replacing the if chain with if; else if; ... resulted in a small > speedup. Replacing it with a switch() in a bigger one. Cool, but this patch is impossible to validate by eye. Could you resubmit a version that doesn't reindent unchanged code? Leave it for pgindent to clean that up later. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] EXPLAIN (plan off, rewrite off) for benchmarking
Hi, For benchmarking the parser I added the above options (dim suggested this on irc) which proved to be rather useful for me. I added the additional rewrite option because the overhead of copying the tree around makes the profile significantly less expressive. I would also like an option which would only do the actual parsing instead of parse + parse analyse but that seemed a tad more complicated... Is anybody else interested in such EXPLAIN options? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: list API / memory allocations
Andres Freund writes: > In many scenarios memory allocation is one of the top 3 functions showing up > in profiles. Looking at hierarchical profiles (-fno-omit-frame-pointer) at > least > during parsing, planning and executor startup most of that is spent around > the > list API. > Many - especially in the parse-analyze phase - of those allocations can be > avoided because the lists are immutable and their size is known upfront. The fundamental problem with all of those proposals is that now you have some lists in the system that aren't like other lists, and will result in dumping core if the wrong sorts of operations are applied to them. I don't particularly care for introducing that kind of fragility into the system in return for marginal speed gains. I'm not impressed by Asserts showing that no such thing happens in the cases you tested; the test coverage won't be complete, and even if it is, innocent-looking code changes later on could create new problems. Now, if you could do something that *doesn't* restrict what operations could be applied to the lists, that would be good. I've wished for a long while that we could allocate the list header and the first list cell in a single palloc cycle. This would basically require getting list_delete_cell to refrain from pfree'ing a cell that got allocated that way, which is easy as long as you have the list header at hand, but what happens if the list is later concat'd to another? A subsequent delete operation would be referring to the other list header and would come to the wrong conclusion. While thinking about this just now, it occurred to me that maybe the issues could be dodged if the cell, not the header, were first in the combined palloc block. list_concat is then no longer a problem, as long as it doesn't try to throw away the second list's header. But I haven't thought long enough to be sure how well that would work. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] RFC: list API / memory allocations
Hi List, In many scenarios memory allocation is one of the top 3 functions showing up in profiles. Looking at hierarchical profiles (-fno-omit-frame-pointer) at least during parsing, planning and executor startup most of that is spent around the list API. Many - especially in the parse-analyze phase - of those allocations can be avoided because the lists are immutable and their size is known upfront. Some examples: parser: list_make$n, buildRelationAliases planner: expression_tree_mutator, get_relation_info; executor: ExecInitExpr For that I added new functions/defines which allocate all the needed memory in one hunk: list_immutable_make$n(), List *list_new_immutable_n(NodeTag t, size_t n); List *list_make_n(NodeTag t, size_t n, ...); With those I could avoid about 1/3 of the allocations in some example scenarios (pgbench, custom benchmarks) by replacing a few notable callsites to use statically allocated lists. The obvious problem with that approach is that approach is that those List's and ListCell's cannot be individually deallocated. Which especially in the planner isn't done anyway. To check that I added an allocate_only to both when USE_ASSERT_CHECKING. Using that approach I did measure improvements between 0.5-20% depending on the statement (using -M simple). Complex statements which are quick to execute and not too complicated to plan are the ones benefiting most. Which is not surprising. The questions I have here: * is that approach acceptable? I converted a the most notable callsites and the benefit is quite measurable. On the other hand one has to be rather careful when analyzing whether lists will be deallocated later on. Also its touching rather much code * I plan to add list_copy_immutable as another function as that is the next biggest scenario where memory is allocated in forseeable scenarios. * any suggestions what name to use instead of immutable? That doesn't really cut the real meaning of "allocate only". I didn't find a name though. Then there is the 2/3 of calls which I couldn't beat with that approach. Mostly because the size of the lists is too hard to figure out. My current approach to that would be to preallocate listcells by some caller defined amount. A *very* quick hack which always overallocates Lists to contain 10 elements yields promising results (around another 2-20% of parse only time). The problem with that is that is that the most sensible way seems to be to define the amount of preallocation per list. Which is currently not easily representable because empty lists are just 0/NILL. So I have 3 solutions. All of which are not that great: * allow empty Lists. This possibly requires a bit of code churn but seems somewhat sensible. * Save the amount of preallocation needed in NIL by defining that no pointer can be less than say 0x100. That also requires some churn and gets points for uglyness * always overallocate new lists. This is rather wasteful. Any ideas? if anybody wants the preliminary patches I am happy to send them but I would much rather prefer to make them somewhat after input. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core Extensions relocation
On 11/18/11 12:27 PM, Dimitri Fontaine wrote: > Tom Lane writes: >> Why do you figure that, exactly? The path of least resistance will >> be precisely to leave everything packaged as it is, in a single >> postgresql-contrib module. I'm pretty likely to do that myself for >> Fedora and RHEL. Subdividing/rearranging contrib makes the packager's >> life more complicated, *and* makes his users' lives more complicated, >> if only because things aren't where they were before. It seems unlikely >> to happen, at least in the near term. > > Then if we want packagers to move, what about removing all the > extensions not listed by Greg from the contrib/ directory and inventing > another place where to manage them, which is not automatically built, > but still part of buildfarm tests, if at all possible. Actually, the whole idea is that the "Core Management Extensions" should move from the -contrib module to the -server module. That is, those extensions should always get installed with any server. Of course, packagers may then reasonably ask why that code is not just part of the core? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core Extensions relocation
Tom Lane writes: > Why do you figure that, exactly? The path of least resistance will > be precisely to leave everything packaged as it is, in a single > postgresql-contrib module. I'm pretty likely to do that myself for > Fedora and RHEL. Subdividing/rearranging contrib makes the packager's > life more complicated, *and* makes his users' lives more complicated, > if only because things aren't where they were before. It seems unlikely > to happen, at least in the near term. Then if we want packagers to move, what about removing all the extensions not listed by Greg from the contrib/ directory and inventing another place where to manage them, which is not automatically built, but still part of buildfarm tests, if at all possible. Then the only change we suggest to packagers is to have the main PostgreSQL package install the contrib one by means of dependencies. I don't much like this solution, but that's how I read your email. The status quo is not a good place to live in. > The upstream project can't force these decisions on packagers, and it > doesn't help to go about under the illusion that we can. Really? You are packaging for RHEL, Dave is responsible for the windows packaging, Devrim is covering the other RPM systems (apart from SuSE maybe and I'm not even sure) and Martin is caring for debian and ubuntu and following along. We're missing BSD ports packagers, and we're covering like 90% or more of the servers and developers installs. We can't force everybody hands to doing it our way, but I'm pretty sure we can talk to them and see what they think about the usefulness of this proposal and how they intend to react. We're not *that* disconnected. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Andres Freund wrote: > When doing line-level profiles I would suggest looking at the > instructions. What's the best way to do that? > I don't think cache line contention is the most likely candidate > here. Simple cache-misses seem far more likely. In combination > with pipeline stalls... > > Newer cpus (nehalem+) can measure stalled cycles which can be > really useful when analyzing performance. I don't remember how to > do that with oprofile right now though as I use perf these days > (its -e stalled-cycles{frontend|backend} there}). When I run oprofile, I still always go back to this post by Tom: http://archives.postgresql.org/pgsql-performance/2009-06/msg00154.php Can anyone provide such a "cheat sheet" for perf? I could give that a try if I knew how. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Friday, November 18, 2011 08:36:59 PM Kevin Grittner wrote: > "Kevin Grittner" wrote: > > samples %image name symbol name > > 4954633.6718 postgreshash_search_with_hash_value > > When lines like these show up in the annotated version, I'm > impressed that we're still finding gains as big as we are: > > 44613 0.3306 :if (segp == NULL) > >:hash_corrupted(hashp); > > 101910 0.7552 :keysize = hashp->keysize; /* ditto */ When doing line-level profiles I would suggest looking at the instructions. Quite often the line shown doesn't have much to do whats executed as the compiler tries to schedule instructions cleverly. Also in many situations the shown cost doesn't actually lie in the instruction shown but in some previous one. The shown instruction e.g. has to wait for the result of the earlier instructions. Pipelining makes that hard to correctly observe. A simplified example would be something like: bool func(int a, int b, int c){ int res = a / b; if(res == c){ return true; } return false; } Likely the instruction showing up in the profile would be the comparison. Which obviously is not the really expensive part... > There goes over 1% of my server run time, right there! > > Of course, these make no sense unless there is cache line > contention, which is why that area is bearing fruit. I don't think cache line contention is the most likely candidate here. Simple cache-misses seem far more likely. In combination with pipeline stalls... Newer cpus (nehalem+) can measure stalled cycles which can be really useful when analyzing performance. I don't remember how to do that with oprofile right now though as I use perf these days (its -e stalled-cycles{frontend|backend} there}). Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: >> I think so. My take was that it was showing 32 of 64 *threads* >> active -- the hyperthreading funkiness. Is there something in >> particular you'd like me to check? > > Not really, just don't understand the number. I'm having trouble resolving the vmstat numbers I got during the 32-client pgbench runs which modified data. -M simple: procs --memory- ---swap-- -io- r b swpd free buffcache si sobi bo system -cpu-- in cs us sy id wa st 30 1 4464 513492 205564572 5447212400 0 78170 621724 1246300 30 8 61 1 0 27 1 4464 509288 205564572 5447460000 0 125620 599403 1192046 29 8 63 1 0 35 1 4464 508368 205564572 5447699600 0 89801 595939 1186496 29 8 63 0 0 25 0 4464 506088 205564572 5447866800 0 90121 594800 1189649 28 8 63 0 0 -M prepared: procs --memory-- ---swap-- -io- r b swpdfree buffcache si sobi bo system -cpu-- in cs us sy id wa st 28 0 5612 1204404 205107344 5423053600 0 93212 527284 1456417 22 9 69 0 0 8 1 5612 1202044 205107344 542600 0 93217 512819 1417457 21 9 70 1 0 17 1 5612 1201892 205107344 5423604800 0 132699 502333 1412878 21 9 70 0 0 19 1 5612 1199208 205107344 5423893600 0 93612 519113 1484386 21 9 69 0 0 So 60% or 70% idle without any I/O wait time. I don't know how to explain that. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] range_adjacent and discrete ranges
Jeff Davis writes: > On Fri, 2011-11-18 at 10:33 -0500, Tom Lane wrote: >> Would it be better for them to silently transform such cases to "empty"? > I wouldn't like to extend that to int4range(4,3), however. When the > upper bound is less than the lower bound, it's almost certainly a > mistake, and the user should be informed. Yeah, probably not. However, I don't like the idea of '(3,4)'::int4range throwing an error, as it currently does, because it seems to require the application to have quite a lot of knowledge of the range semantics to avoid having errors sprung on it. > By the way, what does this have to do with canonical functions? This > seems more like a constructor issue, and there is already a > zero-argument constructor to make empty ranges. What I was concerned about was whether Florian's idea of implementing range_adjacent by testing for empty intervening range would work, or would fail because of errors getting thrown. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
"Kevin Grittner" wrote: > samples %image name symbol name > 4954633.6718 postgreshash_search_with_hash_value When lines like these show up in the annotated version, I'm impressed that we're still finding gains as big as we are: 44613 0.3306 :if (segp == NULL) :hash_corrupted(hashp); 101910 0.7552 :keysize = hashp->keysize; /* ditto */ There goes over 1% of my server run time, right there! Of course, these make no sense unless there is cache line contention, which is why that area is bearing fruit. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
"anara...@anarazel.de" wrote: > Kevin Grittner schrieb: >>samples %image name symbol name >>9333944.9651 postgresAllocSetAlloc >>8484764.5134 postgresbase_yyparse >>7195153.8274 postgresSearchCatCache > That profile looks like you ran pgbench with -m simple. How does > it look with prepared instead? samples %image name symbol name 4954633.6718 postgreshash_search_with_hash_value 4909713.6385 postgresGetSnapshotData 4439653.2902 postgresLWLockAcquire 4435663.2872 postgresAllocSetAlloc 3023882.2409 postgresXLogInsert 2868892.1261 postgresSearchCatCache 2464171.8262 postgresPostgresMain 2350181.7417 postgresheap_page_prune 1984421.4706 postgres_bt_compare 1814461.3447 postgreshash_any 1771311.3127 postgresExecInitExpr 1757751.3026 postgresLWLockRelease 1523241.1288 postgresPinBuffer 1502851.1137 postgresexec_bind_message 1452141.0762 postgresfmgr_info_cxt_security 1404931.0412 postgress_lock 1241620.9201 postgresLockAcquireExtended 1204290.8925 postgresMemoryContextAlloc 1170760.8676 postgrespfree 1164930.8633 postgresAllocSetFree 1050270.7783 postgrespgstat_report_activity 1014070.7515 postgresProcArrayLockAcquire 1007970.7470 postgresMemoryContextAllocZeroAligned 98360 0.7289 postgresProcArrayLockRelease 86938 0.6443 postgresheap_hot_search_buffer 82635 0.6124 postgreshash_search 79902 0.5921 postgreserrstart 79465 0.5889 postgresHeapTupleSatisfiesVacuum 78709 0.5833 postgresResourceOwnerReleaseInternal 76068 0.5637 postgresExecModifyTable 73043 0.5413 postgresheap_update 72175 0.5349 postgresstrlcpy 71253 0.5280 postgresMemoryContextAllocZero tps = 27392.219364 (including connections establishing) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Kevin Grittner schrieb: >Robert Haas wrote: > >> Any chance you can run oprofile (on either branch, don't really >> care) against the 32 client test and post the results? > >Besides the other changes we discussed, I boosted scale to 150 and >ran at READ COMMITTED isolation level (because all threads promptly >crashed and burned at REPEATABLE READ -- we desperately need a >pgbench option to retry a transaction on serialization failure). >The oprofile hot spots at half a percent or higher: > >CPU: Intel Core/i7, speed 2262 MHz (estimated) >Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with >a unit mask of 0x00 (No unit mask) count 10 >samples %image name symbol name >9333944.9651 postgresAllocSetAlloc >8484764.5134 postgresbase_yyparse >7195153.8274 postgresSearchCatCache >4612752.4537 postgreshash_search_with_hash_value >4264112.2682 postgresGetSnapshotData >3229381.7178 postgresLWLockAcquire >3222361.7141 postgrescore_yylex >3054711.6249 postgresMemoryContextAllocZeroAligned >2815431.4976 postgresexpression_tree_walker >2702411.4375 postgresXLogInsert >2348991.2495 postgresMemoryContextAlloc >2101371.1178 postgresScanKeywordLookup >1848570.9833 postgresheap_page_prune >1736080.9235 postgreshash_any >1530110.8139 postgres_bt_compare >1445380.7689 postgresnocachegetattr >1314660.6993 postgresfmgr_info_cxt_security >1310010.6968 postgresgrouping_planner >1308080.6958 postgresLWLockRelease >1241120.6602 postgresPinBuffer >1207450.6423 postgresLockAcquireExtended >1129920.6010 postgresExecInitExpr >1128300.6002 postgreslappend >1123110.5974 postgresnew_list >1103680.5871 postgrescheck_stack_depth >1060360.5640 postgresAllocSetFree >1025650.5456 postgresMemoryContextAllocZero >94689 0.5037 postgresSearchSysCache That profile looks like you ran pgbench with -m simple. How does it look with prepared instead? Andres >-- >Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: > Any chance you can run oprofile (on either branch, don't really > care) against the 32 client test and post the results? Besides the other changes we discussed, I boosted scale to 150 and ran at READ COMMITTED isolation level (because all threads promptly crashed and burned at REPEATABLE READ -- we desperately need a pgbench option to retry a transaction on serialization failure). The oprofile hot spots at half a percent or higher: CPU: Intel Core/i7, speed 2262 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 10 samples %image name symbol name 9333944.9651 postgresAllocSetAlloc 8484764.5134 postgresbase_yyparse 7195153.8274 postgresSearchCatCache 4612752.4537 postgreshash_search_with_hash_value 4264112.2682 postgresGetSnapshotData 3229381.7178 postgresLWLockAcquire 3222361.7141 postgrescore_yylex 3054711.6249 postgresMemoryContextAllocZeroAligned 2815431.4976 postgresexpression_tree_walker 2702411.4375 postgresXLogInsert 2348991.2495 postgresMemoryContextAlloc 2101371.1178 postgresScanKeywordLookup 1848570.9833 postgresheap_page_prune 1736080.9235 postgreshash_any 1530110.8139 postgres_bt_compare 1445380.7689 postgresnocachegetattr 1314660.6993 postgresfmgr_info_cxt_security 1310010.6968 postgresgrouping_planner 1308080.6958 postgresLWLockRelease 1241120.6602 postgresPinBuffer 1207450.6423 postgresLockAcquireExtended 1129920.6010 postgresExecInitExpr 1128300.6002 postgreslappend 1123110.5974 postgresnew_list 1103680.5871 postgrescheck_stack_depth 1060360.5640 postgresAllocSetFree 1025650.5456 postgresMemoryContextAllocZero 94689 0.5037 postgresSearchSysCache Do you want line numbers or lower percentages? Two runs: tps = 21946.961196 (including connections establishing) tps = 22911.873227 (including connections establishing) For write transactions, that seems pretty respectable. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COUNT(*) and index-only scans
On 12 October 2011 17:26, Robert Haas wrote: > On Wed, Oct 12, 2011 at 11:59 AM, Tom Lane wrote: >> The place where the decision is actually somewhat hard, IMO, is where >> you're pulling a small part of the table but significantly more than one >> row, and the traditional best choice would be a bitmap scan. Now we >> have to guess whether possibly avoiding heap fetches is better than >> batching the fetches, and it doesn't seem open-and-shut to me. > > Yes, I agree. > > I was actually wondering if there is some way we could make index-only > scans work for bitmap index scans. Something like this: If the index > is covering, then as we retrieve each tuple, we check whether the page > is all-visible. If so, we return the data from the index tuple. If > not, we save the TID for later. Then, when we get done scanning the > index, we go back and fetch all the pages containing saved TIDs in > ascending block number order. The trouble is that I think you could > get in some trouble if you use a TID bitmap here, because if you > lossify the bitmap then you have to make sure you can't return a tuple > that you already handled with the index-only approach (imagine that > the visibility map bit gets cleared partway through the scan). All in > all this seems pretty complicated... So is there a chance of getting bitmap index-only scans? -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FlexLocks
On Fri, Nov 18, 2011 at 10:29 PM, Robert Haas wrote: > > So the upside and downside of this approach is that it modifies the > existing LWLock implementation rather than allowing multiple lock > implementations to exist side-by-side. That means every LWLock in the > system has access to this functionality, which might be convenient if > there turn out to be many uses for this technique. Right. > The bad news is > that everyone pays the cost of checking the work queue in > LWLockRelease(). I hope that would be minimal (may be just one instruction) for those who don't want to use the facility. > It also means that you can't, for example, create a > custom lock with different lock modes (e.g. S, SX, X, as I proposed > upthread). > Thats a valid point. > I am pretty dubious that there are going to be very many cases where > we can get away with holding the spinlock while doing the work. For > example, WAL flush is a clear example of where we can optimize away > spinlock acquisitions - if we communicate to people we wake up that > their LSN is already flushed, they needn't reacquire the lock to > check. But we certainly can't hold a spinlock across a WAL flush. > I think thats mostly solvable as said upthread. We can and should improve this mechanism so that the work is carried out with the necessary LWLock instead of the spinlock. That would let other processes to queue up for the lock while the tasks are being executed. Or if the tasks only need shared lock, then other normal shared requesters can go ahead and acquire the lock. When I get some time, I would like to see if this can be extended to have shared snapshots so that multiple callers of GetSnapshotData() get the same snapshot, computed only once by scanning the proc array, instead of having each process compute its own snapshot which remains the same unless some transaction ends in between. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: > Yeah, I'd just drop -S. Easily done. > Make sure to use -c N -j N with pgbench, or you'll probably not be > able to saturate it. Yeah, that's part of the script I copied from you. > I've also had good luck with wal_writer_delay=20ms, although if > you have synchronous_commit=on that might not matter, and it's > much less important since Simon's recent patch in that area went > in. What the heck; will do. > What scale factor are you testing at? 100. Perhaps I should boost that since I'm going as far as 128 clients? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: better support for debugging of overloaded functions
2011/11/18 Robert Haas : > On Fri, Nov 18, 2011 at 6:24 AM, Pavel Stehule > wrote: >> CONTEXT: PL/pgSQL function "assign_rslts" line 50 at assignment (oid: 65903) >> >> \sf+ 65903 > > I'm pretty unenthused by the idea of making OIDs more user-visible > than they already are. If the message is ambiguous, we should include > argument types and (if not the object that would be visible under the > current search_path) a schema qualification. Spitting out a five (or > six or seven or eight) digit number doesn't seem like a usability > improvement. > yes - it's not nice - but it is simple and robust and doesn't depend on actual search_path setting. Nicer solution is a function signature - it can be assembled when function is compiled. I see only one disadvantage - signature can be too wide and can depend on search_path (and search_path can be different when function is executed and when someone run sql console). Signature should be prepared before execution, because there are no access to system tables after exception. I like any solution, because debugging of overloaded function is terrible now. Regards Pavel > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Fri, Nov 18, 2011 at 12:45 PM, Kevin Grittner wrote: > OK. Sorry for misunderstanding that. I haven't gotten around to a > deep reading of the patch yet. :-( I based this on the test script > you posted here (with slight modifications for my preferred > directory structures): > > http://archives.postgresql.org/pgsql-hackers/2011-10/msg00605.php > > If I just drop the -S switch will I have a good test, or are there > other adjustments I should make (besides increasing checkpoint > segments)? (Well, for the SELECT-only test I didn't bother putting > pg_xlog on a separate RAID 10 on it's own BBU controller as we > normally would for this machine, I'll cover that, too.) Yeah, I'd just drop -S. Make sure to use -c N -j N with pgbench, or you'll probably not be able to saturate it. I've also had good luck with wal_writer_delay=20ms, although if you have synchronous_commit=on that might not matter, and it's much less important since Simon's recent patch in that area went in. What scale factor are you testing at? >> It doesn't make any sense for PostgreSQL master to be using only >> 50% of the CPU and leaving the rest idle on a lots-of-clients >> SELECT-only test. That could easily happen on 9.1, but my lock >> manager changes eliminated the only place where anything gets put >> to sleep in that path (except for the emergency sleeps done by >> s_lock, when a spinlock is really badly contended). So I'm >> confused by these results. Are we sure that the processes are >> being scheduled across all 32 physical cores? > > I think so. My take was that it was showing 32 of 64 *threads* > active -- the hyperthreading funkiness. Is there something in > particular you'd like me to check? Not really, just don't understand the number. >> At any rate, I do think it's likely that you're being bitten by >> spinlock contention, but we'd need to do some legwork to verify >> that and work out the details. Any chance you can run oprofile >> (on either branch, don't really care) against the 32 client test >> and post the results? If it turns out s_lock is at the top of the >> heap, I can put together a patch to help figure out which spinlock >> is the culprit. > > oprofile isn't installed on this machine. I'll take care of that > and post results when I can. OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: > Kevin Grittner wrote: >>> Then again, is this a regular pgbench test or is this >>> SELECT-only? >> >> SELECT-only > > Ah, OK. I would not expect flexlocks to help with that; Pavan's > patch might, though. OK. Sorry for misunderstanding that. I haven't gotten around to a deep reading of the patch yet. :-( I based this on the test script you posted here (with slight modifications for my preferred directory structures): http://archives.postgresql.org/pgsql-hackers/2011-10/msg00605.php If I just drop the -S switch will I have a good test, or are there other adjustments I should make (besides increasing checkpoint segments)? (Well, for the SELECT-only test I didn't bother putting pg_xlog on a separate RAID 10 on it's own BBU controller as we normally would for this machine, I'll cover that, too.) > It doesn't make any sense for PostgreSQL master to be using only > 50% of the CPU and leaving the rest idle on a lots-of-clients > SELECT-only test. That could easily happen on 9.1, but my lock > manager changes eliminated the only place where anything gets put > to sleep in that path (except for the emergency sleeps done by > s_lock, when a spinlock is really badly contended). So I'm > confused by these results. Are we sure that the processes are > being scheduled across all 32 physical cores? I think so. My take was that it was showing 32 of 64 *threads* active -- the hyperthreading funkiness. Is there something in particular you'd like me to check? > At any rate, I do think it's likely that you're being bitten by > spinlock contention, but we'd need to do some legwork to verify > that and work out the details. Any chance you can run oprofile > (on either branch, don't really care) against the 32 client test > and post the results? If it turns out s_lock is at the top of the > heap, I can put together a patch to help figure out which spinlock > is the culprit. oprofile isn't installed on this machine. I'll take care of that and post results when I can. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: better support for debugging of overloaded functions
On Fri, Nov 18, 2011 at 6:24 AM, Pavel Stehule wrote: > CONTEXT: PL/pgSQL function "assign_rslts" line 50 at assignment (oid: 65903) > > \sf+ 65903 I'm pretty unenthused by the idea of making OIDs more user-visible than they already are. If the message is ambiguous, we should include argument types and (if not the object that would be visible under the current search_path) a schema qualification. Spitting out a five (or six or seven or eight) digit number doesn't seem like a usability improvement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
On Fri, Nov 18, 2011 at 12:03 PM, Kevin Grittner wrote: >> Then again, is this a regular pgbench test or is this SELECT-only? > > SELECT-only Ah, OK. I would not expect flexlocks to help with that; Pavan's patch might, though. >> Can you by any chance check top or vmstat during the 32-client >> test and see what percentage you have of user time/system >> time/idle time? > > You didn't say whether you wanted master or flexlock, but it turned > out that any difference was way too far into the noise to show. > They both looked like this: > > procs --memory- ---swap-- -io > r b swpd free buff cache si so bi bo > system -cpu-- > in cs us sy id wa st > 38 0 352 1157400 207177020 52360472 0 0 0 16 > 13345 1190230 40 7 53 0 0 > 37 0 352 1157480 207177020 52360472 0 0 0 0 > 12953 1263310 40 8 52 0 0 > 36 0 352 1157484 207177020 52360472 0 0 0 0 > 13411 1233365 38 7 54 0 0 > 37 0 352 1157476 207177020 52360472 0 0 0 0 > 12780 1193575 41 7 51 0 0 > > Keep in mind that while there are really 32 cores, the cpu > percentages seem to be based on the "threads" from hyperthreading. > Top showed pgbench (running on the same machine) as eating a pretty > steady 5.2 of the cores, leaving 26.8 cores to actually drive the 32 > postgres processes. It doesn't make any sense for PostgreSQL master to be using only 50% of the CPU and leaving the rest idle on a lots-of-clients SELECT-only test. That could easily happen on 9.1, but my lock manager changes eliminated the only place where anything gets put to sleep in that path (except for the emergency sleeps done by s_lock, when a spinlock is really badly contended). So I'm confused by these results. Are we sure that the processes are being scheduled across all 32 physical cores? At any rate, I do think it's likely that you're being bitten by spinlock contention, but we'd need to do some legwork to verify that and work out the details. Any chance you can run oprofile (on either branch, don't really care) against the 32 client test and post the results? If it turns out s_lock is at the top of the heap, I can put together a patch to help figure out which spinlock is the culprit. Anyway, this is probably a digression as it relates to FlexLocks: those are not optimizing for a read-only workload. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
"Kevin Grittner" wrote: > We have a 32-core Intel box (4 x X7560 @ 2.27GHz) with 256 GB > RAM. In case anyone cares, this is the same box for which I posted STREAM test results a while back. The PostgreSQL tests seem to peak on this 32-core box at 64 clients, while the STREAM test of raw RAM speed kept increasing up to 128 clients. Overall, though, it's impressive how close PostgreSQL is now coming to the raw RAM access speed curve. http://archives.postgresql.org/pgsql-hackers/2011-08/msg01306.php -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: > Then again, is this a regular pgbench test or is this SELECT-only? SELECT-only > Can you by any chance check top or vmstat during the 32-client > test and see what percentage you have of user time/system > time/idle time? You didn't say whether you wanted master or flexlock, but it turned out that any difference was way too far into the noise to show. They both looked like this: procs --memory- ---swap-- -io r b swpdfree buffcache si sobibo system -cpu-- in cs us sy id wa st 38 0352 1157400 207177020 5236047200 016 13345 1190230 40 7 53 0 0 37 0352 1157480 207177020 5236047200 0 0 12953 1263310 40 8 52 0 0 36 0352 1157484 207177020 5236047200 0 0 13411 1233365 38 7 54 0 0 37 0352 1157476 207177020 5236047200 0 0 12780 1193575 41 7 51 0 0 Keep in mind that while there are really 32 cores, the cpu percentages seem to be based on the "threads" from hyperthreading. Top showed pgbench (running on the same machine) as eating a pretty steady 5.2 of the cores, leaving 26.8 cores to actually drive the 32 postgres processes. > What OS are you running? Linux new-CIR 2.6.32.43-0.4-default #1 SMP 2011-07-14 14:47:44 +0200 x86_64 x86_64 x86_64 GNU/Linux SUSE Linux Enterprise Server 11 (x86_64) VERSION = 11 PATCHLEVEL = 1 -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FlexLocks
On Fri, Nov 18, 2011 at 6:26 AM, Pavan Deolasee wrote: > My apologies for hijacking the thread, but the work seems quite > related, so I thought I should post here instead of starting a new > thread. > > Here is a WIP patch based on the idea of having a shared Q. A process > trying to access the shared memory protected by a LWLock, sets up the > task in its PGPROC and calls a new API LWLockExecute(). If the LWLock > is available, the task is performed immediately and the function > returns. Otherwise, the process queues up itself on the lock. When the > last shared lock holder or the exclusive lock holder call > LWLockRelease(), it scans through such pending tasks, executes them > via a callback mechanism and wakes all those processes along with any > other normal waiter(s) waiting on LWLockAcquire(). > > I have only coded for ProcArrayEndTransaction, but it should fairly > easy to extend the usage at some more places, especially those which > does some simple modifications to the protected area. I don't propose > to use the technique for every user of LWLock, but there can be some > obvious candidates, including this one that Robert found out. > > I see 35-40% improvement for 32-80 clients on a 5 minutes pgbench -N > run with scale factor of 100 and permanent tables. This is on a > 32-core HP IA box. > > There are few things that need some deliberations. The pending tasks > are right now executed while holding the mutex (spinlock). This is > good and bad for obvious reasons. We can possibly change that so that > the work is done without holding the spinlock or leave to the caller > to choose the behavior. Doing it without holding the spinlock will > make the technique interesting for many more callers. We can also > rework the task execution so that pending similar requests from > multiple callers can be combined and executed with a single callback, > if the caller knows its safe to do so. I haven't thought through the > API/callback changes to support that, but its definitely possible and > could be quite useful in many cases. For example, status of many > transactions can be checked with a single lookup of the ProcArray. Or > WAL inserts from multiple processes can be combined and written at > once. So the upside and downside of this approach is that it modifies the existing LWLock implementation rather than allowing multiple lock implementations to exist side-by-side. That means every LWLock in the system has access to this functionality, which might be convenient if there turn out to be many uses for this technique. The bad news is that everyone pays the cost of checking the work queue in LWLockRelease(). It also means that you can't, for example, create a custom lock with different lock modes (e.g. S, SX, X, as I proposed upthread). I am pretty dubious that there are going to be very many cases where we can get away with holding the spinlock while doing the work. For example, WAL flush is a clear example of where we can optimize away spinlock acquisitions - if we communicate to people we wake up that their LSN is already flushed, they needn't reacquire the lock to check. But we certainly can't hold a spinlock across a WAL flush. The nice thing about the FlexLock approach is that it permits fine-grained control over these types of policies: one lock type can switch to exclusive mode, do the work, and then reacquire the spinlock to hand off the baton; another can do the work while holding the spinlock; and still a third can forget about work queues altogether but introduce additional lock modes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] range_adjacent and discrete ranges
On Fri, 2011-11-18 at 13:32 +0100, Florian Pflug wrote: > That information, however, *is* already contained in the canonical > functions, because those function know that (2,3) are empty as an integer > range, but non-empty as a float range. Very good point. Thank you. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] range_adjacent and discrete ranges
On Fri, 2011-11-18 at 10:33 -0500, Tom Lane wrote: > regression=# select int4range(4,4,'(]'); > ERROR: range lower bound must be less than or equal to range upper bound > regression=# select int4range(4,4,'()'); > ERROR: range lower bound must be less than or equal to range upper bound > > Would it be better for them to silently transform such cases to "empty"? That had crossed my mind, but I read the first as saying that it includes 4 and doesn't include 4, which is a little confusing. But I wouldn't object to making them return empty ranges. Seeing that we removed some other errors in favor of returning something, it might be a little more consistent to return empty when possible. I wouldn't like to extend that to int4range(4,3), however. When the upper bound is less than the lower bound, it's almost certainly a mistake, and the user should be informed. By the way, what does this have to do with canonical functions? This seems more like a constructor issue, and there is already a zero-argument constructor to make empty ranges. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] OidFunctionCall* returning null.
I just found this thread: http://archives.postgresql.org/pgsql-general/2011-11/msg00424.php So I'll use the same workaround. Nothing to see here, folks, move along d On Fri, Nov 18, 2011 at 11:17 AM, David Zwarg wrote: > Hello, > > I have been working with the PostGIS developers, and I'm implementing a > facility to use 'callback' functions to process cells in a raster image. > > I have implemented this behind the scenes as a C function that calls a > provided sql regprocedure with OidFunctionCall*. I have been reading the > docs, and it states that "Note that neither arguments nor result are > allowed to be NULL." > > In this use case, there are legitimate reasons to return NULL from a > 'callback' function -- is there an alternative method that I should use, > instead of OidFunctionCall*? > > Thanks, > David >
Re: [HACKERS] testing ProcArrayLock patches
On Fri, Nov 18, 2011 at 11:26 AM, Kevin Grittner wrote: > Robert Haas wrote: >> Nate Boley's AMD 6128 box (which has 32 cores) and an HP Integrity >> server (also with 32 cores). > >> [clear improvement with flexlock patch] > > Hmm. We have a 32-core Intel box (4 x X7560 @ 2.27GHz) with 256 GB > RAM. It's about a week from going into production, at which point > it will be extremely hard to schedule such tests, but for a few days > more I've got shots at it. The flexlock patch doesn't appear to be > such a clear win here. > > I started from Robert's tests, but used these settings so that I > could go to higher client counts and better test serializable > transactions. Everything is fully cached. > > max_connections = 200 > max_pred_locks_per_transaction = 256 > shared_buffers = 8GB > maintenance_work_mem = 1GB > checkpoint_segments = 30 > checkpoint_timeout = 15min > checkpoint_completion_target = 0.9 > seq_page_cost = 0.1 > random_page_cost = 0.1 > cpu_tuple_cost = 0.05 > effective_cache_size = 40GB > default_transaction_isolation = '$iso' I had a dismaying benchmarking experience recently that involved settings very similar to the ones you've got there - in particular, I also had checkpoint_segments set to 30. When I raised it to 300, performance improved dramatically at 8 clients and above. Then again, is this a regular pgbench test or is this SELECT-only? Because the absolute numbers you're posting are vastly higher than anything I've ever seen on a write test. Can you by any chance check top or vmstat during the 32-client test and see what percentage you have of user time/system time/idle time? What OS are you running? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas wrote: > Nate Boley's AMD 6128 box (which has 32 cores) and an HP Integrity > server (also with 32 cores). > [clear improvement with flexlock patch] Hmm. We have a 32-core Intel box (4 x X7560 @ 2.27GHz) with 256 GB RAM. It's about a week from going into production, at which point it will be extremely hard to schedule such tests, but for a few days more I've got shots at it. The flexlock patch doesn't appear to be such a clear win here. I started from Robert's tests, but used these settings so that I could go to higher client counts and better test serializable transactions. Everything is fully cached. max_connections = 200 max_pred_locks_per_transaction = 256 shared_buffers = 8GB maintenance_work_mem = 1GB checkpoint_segments = 30 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 seq_page_cost = 0.1 random_page_cost = 0.1 cpu_tuple_cost = 0.05 effective_cache_size = 40GB default_transaction_isolation = '$iso' Serializable results not shown here -- that's to gather information for trying to improve SSI locking. m1 tps = 7847.834544 (including connections establishing) f1 tps = 7917.225382 (including connections establishing) m2 tps = 18672.145526 (including connections establishing) f2 tps = 17486.435322 (including connections establishing) m4 tps = 34371.278253 (including connections establishing) f4 tps = 34465.898173 (including connections establishing) m8 tps = 68228.261694 (including connections establishing) f8 tps = 68505.285830 (including connections establishing) m16 tps = 127449.815100 (including connections establishing) f16 tps = 127208.939670 (including connections establishing) m32 tps = 201738.209348 (including connections establishing) f32 tps = 201637.237903 (including connections establishing) m64 tps = 380326.800557 (including connections establishing) f64 tps = 380628.429408 (including connections establishing) m80 tps = 366628.197546 (including connections establishing) f80 tps = 162594.012051 (including connections establishing) m96 tps = 360922.948775 (including connections establishing) f96 tps = 366728.987041 (including connections establishing) m128 tps = 352159.631878 (including connections establishing) f128 tps = 355475.129448 (including connections establishing) I did five runs each and took the median. In most cases, the values were pretty close to one another in a group, so confidence is pretty high that this is meaningful. There were a few anomalies where performance for one or more samples was horrid. This seems consistent with the theory of pathological pileups on the LW locks (or also flexlocks?). The problem groups: m64 tps = 380407.768906 (including connections establishing) m64 tps = 79197.470389 (including connections establishing) m64 tps = 381112.194105 (including connections establishing) m64 tps = 378579.036542 (including connections establishing) m64 tps = 380326.800557 (including connections establishing) m96 tps = 360582.945291 (including connections establishing) m96 tps = 363021.805138 (including connections establishing) m96 tps = 362468.870516 (including connections establishing) m96 tps = 59614.322351 (including connections establishing) m96 tps = 360922.948775 (including connections establishing) f80 tps = 158905.149822 (including connections establishing) f80 tps = 157192.460599 (including connections establishing) f80 tps = 370757.790443 (including connections establishing) f80 tps = 162594.012051 (including connections establishing) f80 tps = 372170.638516 (including connections establishing) f96 tps = 366804.733788 (including connections establishing) f96 tps = 366728.987041 (including connections establishing) f96 tps = 365490.380848 (including connections establishing) f96 tps = 366770.193305 (including connections establishing) f96 tps = 125225.371140 (including connections establishing) So the lows don't seem to be as low when they happen with the flexlock patch, but they still happen -- possibly more often? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] OidFunctionCall* returning null.
Hello, I have been working with the PostGIS developers, and I'm implementing a facility to use 'callback' functions to process cells in a raster image. I have implemented this behind the scenes as a C function that calls a provided sql regprocedure with OidFunctionCall*. I have been reading the docs, and it states that "Note that neither arguments nor result are allowed to be NULL." In this use case, there are legitimate reasons to return NULL from a 'callback' function -- is there an alternative method that I should use, instead of OidFunctionCall*? Thanks, David
Re: [HACKERS] WIP: Collecting statistics on CSV file data
2011/11/18 Shigeru Hanada : > - I couldn't see the reason why file_fdw sets ctid of sample tuples, > though I guess it's for Vitter's random sampling algorithm. If every > FDW must set valid ctid to sample tuples, it should be mentioned in > document of AnalyzeForeignTable. Exporting some functions from > analyze.c relates this issue? If every FDW must set valid ctid to sample tuples, it should be fixed so that they don't have to, I would think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] range_adjacent and discrete ranges
Florian Pflug writes: > ...This definition does not depend on any specific canonical form of ranges, > only on the canonicalize function's ability to detect empty ranges. Hmm, well, now that you mention it, I don't think the current canonical functions handle empty ranges very nicely at all. They tend to spit up: regression=# select int4range(4,4,'[]'); int4range --- [4,5) (1 row) regression=# select int4range(4,4,'(]'); ERROR: range lower bound must be less than or equal to range upper bound regression=# select int4range(4,4,'()'); ERROR: range lower bound must be less than or equal to range upper bound Would it be better for them to silently transform such cases to "empty"? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] VACUUM touching file but not updating relation
On Fri, Nov 18, 2011 at 3:18 PM, Tom Lane wrote: > What Thom's complaining about is that the buffer may be marked dirty > unnecessarily, ie when there has been no actual data change. OK, I'll patch it. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] VACUUM touching file but not updating relation
Simon Riggs writes: > On Fri, Nov 18, 2011 at 2:47 PM, Tom Lane wrote: >> Well, it's expected given the current coding in the btree vacuum logic. >> It's not clear to me why it was written like that, though. > The code works as designed. > _bt_delitems_vacuum() is only ever called with nitems == 0 when it is > the last block of the relation with wal_level = hot standby > As discussed in the comments we must issue a WAL record for the last > block, whatever else has occurred. > So the correct number of WAL records is emitted and I see no bug there. What Thom's complaining about is that the buffer may be marked dirty unnecessarily, ie when there has been no actual data change. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] VACUUM touching file but not updating relation
On Fri, Nov 18, 2011 at 2:47 PM, Tom Lane wrote: > Thom Brown writes: >>> On 11 November 2011 23:28, Tom Lane wrote: I observe that _bt_delitems_vacuum() unconditionally dirties the page and writes a WAL record, whether it has anything to do or not; and that if XLogStandbyInfoActive() then btvacuumscan will indeed call it despite there being (probably) nothing useful to do. Seems like that could be improved. The comment explaining why it's necessary to do that doesn't make any sense to me, either. > >>> Well the effect, in the single instances I've checked, is certainly >>> more pronounced for hot_standby, but there still appears to be some >>> occurrences for minimal wal_level too. > >> So would you say this is acceptable and normal activity, or is >> something awry here? > > Well, it's expected given the current coding in the btree vacuum logic. > It's not clear to me why it was written like that, though. The code works as designed. _bt_delitems_vacuum() is only ever called with nitems == 0 when it is the last block of the relation with wal_level = hot standby As discussed in the comments we must issue a WAL record for the last block, whatever else has occurred. So the correct number of WAL records is emitted and I see no bug there. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] VACUUM touching file but not updating relation
On Fri, Nov 18, 2011 at 2:47 PM, Tom Lane wrote: > Thom Brown writes: >>> On 11 November 2011 23:28, Tom Lane wrote: I observe that _bt_delitems_vacuum() unconditionally dirties the page and writes a WAL record, whether it has anything to do or not; and that if XLogStandbyInfoActive() then btvacuumscan will indeed call it despite there being (probably) nothing useful to do. Seems like that could be improved. The comment explaining why it's necessary to do that doesn't make any sense to me, either. > >>> Well the effect, in the single instances I've checked, is certainly >>> more pronounced for hot_standby, but there still appears to be some >>> occurrences for minimal wal_level too. > >> So would you say this is acceptable and normal activity, or is >> something awry here? > > Well, it's expected given the current coding in the btree vacuum logic. > It's not clear to me why it was written like that, though. I'll take a look. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Do missed autoheader run for previous commit.
Alvaro Herrera writes: > Excerpts from Tom Lane's message of vie nov 18 11:12:32 -0300 2011: >>> Hmm, does the win32 file need updating too? >> I don't see HAVE_SCANDIR in there, do you? > Well, I wonder if the win32 file needs to be hooked to the whole > autoconf/autoheader thing somehow. I mean, if HAVE_SCANDIR wasn't > there, does this mean that when it was added to pg_config.h we forgot to > update the win32 copy? Well, it might mean that, or it might mean that somebody intentionally didn't bother to add the symbol because Windows hasn't got the function. Yeah, it would be nice if autoconf worked for Windows ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] VACUUM touching file but not updating relation
Thom Brown writes: >> On 11 November 2011 23:28, Tom Lane wrote: >>> I observe that _bt_delitems_vacuum() unconditionally dirties the page >>> and writes a WAL record, whether it has anything to do or not; and that >>> if XLogStandbyInfoActive() then btvacuumscan will indeed call it despite >>> there being (probably) nothing useful to do. Seems like that could be >>> improved. The comment explaining why it's necessary to do that doesn't >>> make any sense to me, either. >> Well the effect, in the single instances I've checked, is certainly >> more pronounced for hot_standby, but there still appears to be some >> occurrences for minimal wal_level too. > So would you say this is acceptable and normal activity, or is > something awry here? Well, it's expected given the current coding in the btree vacuum logic. It's not clear to me why it was written like that, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] vpath builds and verbose error messages
Alvaro Herrera writes: > Excerpts from Peter Eisentraut's message of vie nov 18 01:34:18 -0300 2011: >> When using verbose error messages (psql \set VERBOSITY verbose) with a >> vpath build, you get this sort of thing: >> LOCATION: transformColumnRef, >> /build/buildd-postgresql-9.1_9.1.1-3-i386-AP0ovQ/postgresql-9.1-9.1.1/build/../src/backend/parser/parse_expr.c:766 >> >> Can we shorten that path somehow? Either in the C code when printing it >> out, or during the build. Any ideas? > Huh, I hadn't noticed, even though I see those all the time. I agree > with shortening the path so that it's relative to the root source dir, > if that's what you propose: In a non-vpath build you just get the file name (or at least that's what I'm used to seeing). I agree with Peter that a full path seems a bit over the top. It wouldn't be that hard for elog.c to do strrchr(fname, '/') or something like that, but the idea that there are hundreds of full-path strings embedded in the executable is a bit annoying. I guess we could hope that the compiler is bright enough to store it only once per source file, so the actual space requirement may not be all *that* bad. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core Extensions relocation
Greg Smith writes: > On 11/17/2011 03:18 PM, Peter Eisentraut wrote: >> Who's to say that after this, the core extensions won't end up in a new >> separate package postgresql-extensions (or similar) or might even stay >> in postgresql-contrib, for compatibility? > I don't know why packagers would make an active decision that will make > their lives more difficult, with no benefit to them and a regression > against project recommendations for their users. Why do you figure that, exactly? The path of least resistance will be precisely to leave everything packaged as it is, in a single postgresql-contrib module. I'm pretty likely to do that myself for Fedora and RHEL. Subdividing/rearranging contrib makes the packager's life more complicated, *and* makes his users' lives more complicated, if only because things aren't where they were before. It seems unlikely to happen, at least in the near term. > And if some wanted to wander this way, they'll end up having to maintain > a doc patch to address the fact that they've broken with project > recommendations. This text in what I submitted will no longer be true: You're assuming anybody will notice or care about that text, if indeed it gets committed/released with that wording at all. The upstream project can't force these decisions on packagers, and it doesn't help to go about under the illusion that we can. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RangeVarGetRelid()
On Fri, Nov 18, 2011 at 08:58:30AM -0500, Robert Haas wrote: > On Fri, Nov 18, 2011 at 8:37 AM, Noah Misch wrote: > > I count 1/25 callers overriding nowait and 3/25 overriding missing_ok. ?So, > > it's > > looking like a less-common override than the callback function will come to > > be. > > Yeah, you're probably right. However, I think there's another good > reason not to use that signature: in 9.1, the function had a signature > of (RangeVar *, bool). If in 9.2 it ends up with a signature of > (RangeVar *, LOCKMODE), you won't get a compiler warning (at least not > on my system) if you invoke it as RangeVarGetRelid(rel, true). You'll > just get silently (and subtly) different behavior: an error if the > relataion doesn't exist (instead of no error), and an AccessShareLock > if it does (instead of no lock). I think we're best off making sure > that any old-style usage of RangeVarGetRelid() that may be out there > in third-party code fails to compile. Good call. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
Simon Riggs writes: > On Fri, Nov 18, 2011 at 5:20 AM, Robert Haas wrote: >> I think that we should really consider doing with this patch what Tom >> suggested upthread; namely, looking for a mechanism to allow >> individual datatypes to offer up a comparator function that doesn't >> require bouncing through FunctionCall2Coll(). > I don't think its credible to implement that kind of generic > improvement at this stage of the release cycle. Er, *what*? We're in mid development cycle, we are nowhere near release. When exactly would you have us make major changes? In any case, what I understood Robert to be proposing was an add-on feature that could be implemented in one datatype at a time. Not a global flag day. We couldn't really do the latter anyway without making life very unpleasant for authors of extension datatypes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RangeVarGetRelid()
On Fri, Nov 18, 2011 at 8:37 AM, Noah Misch wrote: > I count 1/25 callers overriding nowait and 3/25 overriding missing_ok. So, > it's > looking like a less-common override than the callback function will come to > be. Yeah, you're probably right. However, I think there's another good reason not to use that signature: in 9.1, the function had a signature of (RangeVar *, bool). If in 9.2 it ends up with a signature of (RangeVar *, LOCKMODE), you won't get a compiler warning (at least not on my system) if you invoke it as RangeVarGetRelid(rel, true). You'll just get silently (and subtly) different behavior: an error if the relataion doesn't exist (instead of no error), and an AccessShareLock if it does (instead of no lock). I think we're best off making sure that any old-style usage of RangeVarGetRelid() that may be out there in third-party code fails to compile. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] VACUUM touching file but not updating relation
On 12 November 2011 00:08, Thom Brown wrote: > On 11 November 2011 23:28, Tom Lane wrote: >> Thom Brown writes: >>> On 11 November 2011 00:55, Tom Lane wrote: Thom Brown writes: > I just noticed that the VACUUM process touches a lot of relations > (affects mtime) but for one file I looked at, it didn't change. This > doesn't always happen, and many relations aren't touched at all. >> No immmediate ideas as to why the mtime would change if the file contents didn't. It seems like there must be a code path that marked a buffer dirty without having changed it, but we're usually pretty careful about that. >> >>> I checked all files where the time stamp of the file had changed, but >>> had the same MD5 sum. I used the list in the query you mentioned and >>> get: [ mostly indexes ] >> >> Hmm, is this on a hot standby master? > > It's using a wal_level of hot_standby and has max_wal_senders set to > 2, but it's not actually replicating to anywhere else. But if I > comment out both of these, restart, then compare pre-vacuum and > post-vacuum, I get the following results for unchanged but touched > items: > > test=# select oid,relname from pg_class where relfilenode in > (11680,11682,11684,11686,11690,16530); > oid | relname > ---+- > 2619 | pg_statistic > 2840 | pg_toast_2619 > 2841 | pg_toast_2619_index > 16530 | cows2 > (4 rows) > > The items which didn't match a result in this instance were 11686 and > 11690, which is surprising since they both have a visibility map and > free space map, indicating they're some kind of table. > >> I observe that _bt_delitems_vacuum() unconditionally dirties the page >> and writes a WAL record, whether it has anything to do or not; and that >> if XLogStandbyInfoActive() then btvacuumscan will indeed call it despite >> there being (probably) nothing useful to do. Seems like that could be >> improved. The comment explaining why it's necessary to do that doesn't >> make any sense to me, either. > > Well the effect, in the single instances I've checked, is certainly > more pronounced for hot_standby, but there still appears to be some > occurrences for minimal wal_level too. So would you say this is acceptable and normal activity, or is something awry here? -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
On Fri, Nov 18, 2011 at 3:53 AM, Simon Riggs wrote: > We have no proof that we need to do this for 10 or 100 data types. We > only currently have proof that there is gain for the most common > types. Well, that's kind of my point. I think this needs more work before we decide what the best approach is. So far, the ONLY test result we have that compares inlining to not inlining shows a speedup from 60 ms to 52 ms. I think that an 8 ms speedup on one test with one datatype on one platform/compiler combination isn't sufficient evidence to conclude that this is the best possible approach. I think the way to look at this is that this patch contains two possibly good ideas: one of which is to make the qsort() argument match the qsort() calling convention, and the other of which is to have multiple copies of the qsort() logic that inline the comparators for their respective datatypes. Tom hypothesized that most of the benefit was in the first idea, and the numbers that Peter posted seem to support that conclusion. The first idea is also less invasive and more broadly applicable, so to me that seems like the first thing to pursue. Now, that doesn't mean that we shouldn't consider the second idea as well, but I don't believe that the evidence presented thus far is sufficient to prove that we should go that route. It seems entirely possible that inlining any non-trivial comparator function could work out to a loss, or that the exact choice of compiler flags could affect whether inlining works out to a plus or a minus (what happens with -O3 vs. -O2? what about -O1? what about -O2 -fno-omit-frame-pointer? what if they're using HP's aCC or Intel's icc rather than gcc?). There's no point in installing an optimization that could easily be a pessimization on some other workload, and that hasn't been tested, or at least no results have been posted here. On the other hand, matching the calling convention of the comparator function to what qsort() wants and eliminating the trampoline seems absolutely certain to be a win in every case, and based on the work Peter has done it seems like it might be a quite large win. In fact, you have to ask yourself just exactly how much our function-calling convention is costing us in general. We use that mechanism an awful lot and whatever loss of cycles is involved would be spread all over the code base where oprofile or gprof won't easily be able to pick it out. Even the cost at any particular call site will be split between the caller and the callee. There might be more stuff we could optimize here than just sorting... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RangeVarGetRelid()
On Thu, Nov 17, 2011 at 11:49:06PM -0500, Robert Haas wrote: > On Thu, Nov 17, 2011 at 10:48 PM, Noah Misch wrote: > > On Thu, Nov 17, 2011 at 08:59:58PM -0500, Robert Haas wrote: > >> --- a/src/include/catalog/namespace.h > >> +++ b/src/include/catalog/namespace.h > >> @@ -47,9 +47,15 @@ typedef struct OverrideSearchPath > >> ? ? ? bool ? ? ? ? ? ?addTemp; ? ? ? ? ? ? ? ?/* implicitly prepend temp > >> schema? */ > >> ?} OverrideSearchPath; > >> > >> +typedef void (*RangeVarGetRelidCallback)(const RangeVar *relation, Oid > >> relId, > >> + ? ? Oid oldRelId); > >> > >> -extern Oid ? RangeVarGetRelid(const RangeVar *relation, LOCKMODE lockmode, > >> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?bool missing_ok, bool nowait); > >> +#define RangeVarGetRelid(relation, lockmode, missing_ok, nowait) \ > >> + ? ? ? ? ? ? RangeVarGetRelidExtended(relation, lockmode, missing_ok, > >> nowait, NULL) > >> + > >> +extern Oid ? RangeVarGetRelidExtended(const RangeVar *relation, > >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?LOCKMODE lockmode, bool > >> missing_ok, bool nowait, > >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?RangeVarGetRelidCallback > >> callback); > > > > I like the split of RangeVarGetRelidExtended from RangeVarGetRelid. ?Shall > > we > > also default missing_ok=false and nowait=false for RangeVarGetRelid? > > I thought about that, but just did it this way for now to keep the > patch small for review purposes. Fair enough. > nowait = true is only used in one > place, so it probably makes sense to default it to false in the > non-extended version. But there are a number of callers who want > missing_ok = true behavior, so I'm inclined not to think that one > should be available in the non-extended version. [assuming the "not" in your last sentence was unintended] I count 1/25 callers overriding nowait and 3/25 overriding missing_ok. So, it's looking like a less-common override than the callback function will come to be. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] vpath builds and verbose error messages
Excerpts from Peter Eisentraut's message of vie nov 18 01:34:18 -0300 2011: > When using verbose error messages (psql \set VERBOSITY verbose) with a > vpath build, you get this sort of thing: > > ERROR: 42703: column "foo" does not exist > LINE 1: select foo; >^ > LOCATION: transformColumnRef, > /build/buildd-postgresql-9.1_9.1.1-3-i386-AP0ovQ/postgresql-9.1-9.1.1/build/../src/backend/parser/parse_expr.c:766 > > Can we shorten that path somehow? Either in the C code when printing it > out, or during the build. Any ideas? Huh, I hadn't noticed, even though I see those all the time. I agree with shortening the path so that it's relative to the root source dir, if that's what you propose: LOCATION: transformColumnRef, src/backend/parser/parse_expr.c:766 If it can be done at build time that would be best, because we wouldn't be carrying thousands of useless copies of the source path ... I have no suggestions on how to do this, however. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Review for "Add permission check on SELECT INTO"
The patch is in context diff format and applies cleanly. The functionality is needed because it keeps users from circumventing privilege restrictions, as they can currently do in this case. There is no documentation, which I think is OK since it changes behaviour to work as documented. The patch compiles without warning. It contains a test case, but that test case does not test the feature at all! The expected (and encountered) result is: CREATE SCHEMA selinto_schema; CREATE USER selinto_user; ALTER DEFAULT PRIVILEGES IN SCHEMA selinto_schema REVOKE INSERT ON TABLES FROM selinto_user; SET SESSION AUTHORIZATION selinto_user; SELECT * INTO TABLE selinto_schema.tmp1 FROM onek WHERE onek.unique1 < 2; -- Error ERROR: permission denied for relation onek RESET SESSION AUTHORIZATION; DROP SCHEMA selinto_schema CASCADE; DROP USER selinto_user; This does not fail because "selinto_user" lacks INSERT permission on "selinto_schema.tmp1" (he doesn't!), but because he lacks SELECT permission on "onek" (as the error message indicates). Setting default privileges on a schema can never revoke default privileges. The patch works as advertised in that it causes SELECT ... INTO and CREATE TABLE ... AS to fail, however the error message is misleading. Here a (correct) test case: CREATE ROLE laurenz LOGIN; CREATE TABLE public.test(id, val) AS VALUES (1, 'eins'), (2, 'zwei'); GRANT SELECT ON public.test TO laurenz; ALTER DEFAULT PRIVILEGES FOR ROLE laurenz REVOKE INSERT ON TABLES FROM laurenz; SET ROLE laurenz; CREATE TABLE public.copy1(a, b) AS SELECT * FROM public.test; ERROR: whole-row update is not implemented CREATE TABLE public.copy1(a) AS SELECT id FROM public.test; ERROR: whole-row update is not implemented SELECT * INTO public.copy2 FROM public.test; ERROR: whole-row update is not implemented RESET ROLE; ALTER DEFAULT PRIVILEGES FOR ROLE laurenz GRANT ALL ON TABLES TO laurenz; DROP TABLE test; DROP ROLE laurenz; The correct error message would be: ERROR: permission denied for relation copy1 It seems like a wrong code path is taken in ExecCheckRTEPerms, maybe there's something wrong with rte->modifiedCols. I'll mark the patch as "Waiting on Author" until these problems are addressed. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] range_adjacent and discrete ranges
On Nov18, 2011, at 09:25 , Jeff Davis wrote: > While thinking about range_cmp_bounds, I started to think that the way > range_adjacent works is wrong. > > range_adjacent() depends on the bounds of two ranges to match up, such > that the boundary values are equal, but one is exclusive and the other > inclusive, and one is a lower bound and the other an upper bound. > That makes perfect sense for continuous ranges because that is the only > way they can possibly be adjacent. It also works for the discrete ranges > as defined so far, because they use a canonical function that translates > the values to [) form. But if someone were to write a canonical function > that translates the ranges to [] or () form, range_adjacent would be > useless. Hm, the problem here is for range_adjacent to recognize that [1,2] is adjacent to [3,4] when treated as integer ranges, but that they're not adjacent when treated as float ranges. The reason being, of course, that there's isn't any integer between 2 and 3, but there are floats between 2 and 3. That information, however, *is* already contained in the canonical functions, because those function know that (2,3) are empty as an integer range, but non-empty as a float range. For example, [1,2] is adjacent to [3,4] as integer ranges because (2,3) is empty as an integer range. Conversely, since (2,3) is *not* empty as a float range, [1,2] and [3,4] are *not* adjacent as float ranges. More formally, let there be two arbitrary ranges a,b,i_a,i_b c,d,i_c,i_d where the first two parameters are the lower respectively upper bound, and the last two are booleans saying whether the lower respectively upper bound is inclusive (true) or exclusive (false). These ranges are then adjacent exactly if the range b,c,!i_b,!i_c is empty. This definition does not depend on any specific canonical form of ranges, only on the canonicalize function's ability to detect empty ranges. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Collecting statistics on CSV file data
(2011/11/18 16:25), Etsuro Fujita wrote: > Thank you for your testing. I updated the patch according to your > comments. Attached is the updated version of the patch. I'd like to share result of my review even though it's not fully finished. So far I looked from viewpoint of API design, code formatting, and documentation. I'll examine effectiveness of the patch and details of implementation next week, and hopefully try writing ANALYZE handler for pgsql_fdw :) New patch has correct format, and it could be applied to HEAD of master branch cleanly. All regression tests including those of contrib modules have passed. It contains changes of codes and regression tests related to the issue, and they have enough comments. IMO the document in this patch is not enough to show how to write analyze handler to FDW authors, but it can be enhanced afterward. With this patch, FDW author can provide optional ANALYZE handler which updates statistics of foreign tables. Planner would be able to generate better plan by using statistics. > Yes. But in the updated version, I've refactored analyze.c a little bit > to allow FDW authors to simply call do_analyze_rel(). > The updated version enables FDW authors to just write their own > acquire_sample_rows(). On the other hand, by retaining to hook > AnalyzeForeignTable routine at analyze_rel(), higher level than > acquire_sample_rows() as before, it allows FDW authors to write > AnalyzeForeignTable routine for foreign tables on a remote server to ask > the server for its current stats instead, as pointed out earlier by Tom > Lane. IIUC, this patch offers three options to FDWs: a) set AnalyzeForeignTable to NULL to indicate lack of capability, b) provide AnalyzeForeignTable which calls do_analyze_rel with custom sample_row_acquirer, and c) create statistics data from scratch by FDW itself by doing similar things to do_analyze_rel with given argument or copying statistics from remote PostgreSQL server. ISTM that this design is well-balanced between simplicity and flexibility. Maybe these three options would suit web-based wrappers, file-based or RDBMS wrappers, and pgsql_fdw respectively. I think that adding more details of FdwRoutine, such as purpose of new callback function and difference from required ones, would help FDW authors, including me :) I have some random comments: - I think separated typedef of sample_acquire_rows would make codes more readable. In addition, parameters of the function should be described explicitly. - I couldn't see the reason why file_fdw sets ctid of sample tuples, though I guess it's for Vitter's random sampling algorithm. If every FDW must set valid ctid to sample tuples, it should be mentioned in document of AnalyzeForeignTable. Exporting some functions from analyze.c relates this issue? - Why file_fdw skips sample tuples which have NULL value? AFAIS original acquire_sample_rows doesn't do so. - Some comment lines go past 80 columns. - Some headers included in file_fdw.c seems unnecessary. Regards, -- Shigeru Hanada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FlexLocks
On Thu, Nov 17, 2011 at 10:19 AM, Pavan Deolasee wrote: > On Thu, Nov 17, 2011 at 10:01 AM, Robert Haas wrote: > >> >> I am not convinced that that's a better API. I mean, consider >> something like this: >> >> /* >> * OK, let's do it. First let other backends know I'm in ANALYZE. >> */ >> LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); >> MyProc->vacuumFlags |= PROC_IN_ANALYZE; >> LWLockRelease(ProcArrayLock); > >> I'm not sure exactly how you'd proposed to rewrite that, but I think >> it's almost guaranteed to be more than three lines of code. > > I would guess the ReqRes will look something like this where > ReqResRequest/Response would probably be union of all various requests > and responses, one for each type of request: > > struct ReqRes { > ReqResRequestType reqtype; > ReqResRequest req; > ReqResResponse res; > } > > The code above can be rewritten as: > > reqRes.reqtype = RR_PROC_SET_VACUUMFLAGS; > reqRes.req.set_vacuumflags.flags = PROC_IN_ANALYZE; > LWLockExecute(ProcArrayLock, LW_EXCLUSIVE, &reqRes); > My apologies for hijacking the thread, but the work seems quite related, so I thought I should post here instead of starting a new thread. Here is a WIP patch based on the idea of having a shared Q. A process trying to access the shared memory protected by a LWLock, sets up the task in its PGPROC and calls a new API LWLockExecute(). If the LWLock is available, the task is performed immediately and the function returns. Otherwise, the process queues up itself on the lock. When the last shared lock holder or the exclusive lock holder call LWLockRelease(), it scans through such pending tasks, executes them via a callback mechanism and wakes all those processes along with any other normal waiter(s) waiting on LWLockAcquire(). I have only coded for ProcArrayEndTransaction, but it should fairly easy to extend the usage at some more places, especially those which does some simple modifications to the protected area. I don't propose to use the technique for every user of LWLock, but there can be some obvious candidates, including this one that Robert found out. I see 35-40% improvement for 32-80 clients on a 5 minutes pgbench -N run with scale factor of 100 and permanent tables. This is on a 32-core HP IA box. There are few things that need some deliberations. The pending tasks are right now executed while holding the mutex (spinlock). This is good and bad for obvious reasons. We can possibly change that so that the work is done without holding the spinlock or leave to the caller to choose the behavior. Doing it without holding the spinlock will make the technique interesting for many more callers. We can also rework the task execution so that pending similar requests from multiple callers can be combined and executed with a single callback, if the caller knows its safe to do so. I haven't thought through the API/callback changes to support that, but its definitely possible and could be quite useful in many cases. For example, status of many transactions can be checked with a single lookup of the ProcArray. Or WAL inserts from multiple processes can be combined and written at once. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com commit 24f8e349d085e646cb918c552cc8ead7d38f7013 Author: Pavan Deolasee Date: Fri Nov 18 15:49:54 2011 +0530 Implement a shared work Q mechanism. A process can queue its work for later execution if the protecting lock is currently not available. Backend which releases the last lock will finish the work and wake up the waiting process. diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index 1a48485..59d2958 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -157,6 +157,7 @@ static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId xmax); static TransactionId KnownAssignedXidsGetOldestXmin(void); static void KnownAssignedXidsDisplay(int trace_level); +static bool ProcArrayEndTransactionWQ(WorkQueueData *wqdata); /* * Report shared-memory space needed by CreateSharedProcArray. @@ -331,8 +332,6 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid) elog(LOG, "failed to find proc %p in ProcArray", proc); } - - /* * ProcArrayEndTransaction -- mark a transaction as no longer running * @@ -352,33 +351,24 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid) if (TransactionIdIsValid(latestXid)) { /* - * We must lock ProcArrayLock while clearing proc->xid, so that we do - * not exit the set of "running" transactions while someone else is - * taking a snapshot. See discussion in - * src/backend/access/transam/README. + * Use the shared work queue mechanism to get the work done. If the + * ProcArrayLock is available, it will done immediately, otherwise it + * will be queued up and some other backend (the one who releases the + * lock las
[HACKERS] proposal: better support for debugging of overloaded functions
Hello I am missing a some unique identifier in exception info. I would to use it in combination with \sf statement I have a log WARNING: NP_CPS: a cannot to build a RSLT object DETAIL: dsql_text: SELECT * FROM public._npacceptflatfile(order_nr:=to_number('O0032', 'O')::int,sequence_nr:=1,ref_sequence_nr:=2,recipient_op:=201,losing_op:=303) message: cannot to identify real type for record type variable CONTEXT: PL/pgSQL function "assign_rslts" line 50 at assignment SQL statement "SELECT workflow.assign_rslts('2011-12-18', '09:30:30', to_operator := 201, from_operator := 303)" PL/pgSQL function "inline_code_block" line 855 at PERFORM and I would to look on "assign_rslts" function, but ohs=# \sf workflow.assign_rslts ERROR: more than one function named "workflow.assign_rslts" and I have to find a parameters and manually build a parameters list. My proposal is enhancing l CONTEXT line about function's oid and possibility to use this oid in \sf and \df function some like CONTEXT: PL/pgSQL function "assign_rslts" line 50 at assignment (oid: 65903) ... \sf+ 65903 This simple feature can reduce a time that is necessary to identify a bug in overloaded functions. Other possibility is just enhancing context line to be more friendly to \sf statement CONTEXT: PL/pgSQL function workflow.assign_rslts(date,time without time zone,operatorid_type,operatorid_type)"" line 50 at assignment But this is not too readable and it implementation is harder, because in exception time is not access to system tables - so this string should be cached somewhere. Notes, ideas?? Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core Extensions relocation
On 11/17/2011 03:18 PM, Peter Eisentraut wrote: Who's to say that after this, the core extensions won't end up in a new separate package postgresql-extensions (or similar) or might even stay in postgresql-contrib, for compatibility? I don't know why packagers would make an active decision that will make their lives more difficult, with no benefit to them and a regression against project recommendations for their users. The last thing anyone packaging PostgreSQL wants is more packages to deal with; there are already too many. Each of the current sub-packages has a legitimate technical or distribution standard reason to exist--guidelines like "break out client and server components" or "minimize the package dependencies for the main server". I can't think of any good reason that would inspire the sort of drift you're concerned about. There's little compatibility argument beyond consistency with the previous packages. The reason why this is suddenly feasible now is that the under the hood change are almost all hidden by CREATE EXTENSION. And if some wanted to wander this way, they'll end up having to maintain a doc patch to address the fact that they've broken with project recommendations. This text in what I submitted will no longer be true: "This appendix contains information regarding core extensions that are built and included with a standard installation of PostgreSQL." One of the reasons I picked the name I did was to contrast with the existing description of contrib: "porting tools, analysis utilities, and plug-in features that are not part of the core PostgreSQL system, mainly because they address a limited audience or are too experimental to be part of the main source tree." That says it's perfectly fine to make these optional in another package--they're not "part of the core". That scary wording is practically telling packagers to separate them, so it's easy to keep the experimental stuff away from the production quality components. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
On Fri, Nov 18, 2011 at 5:20 AM, Robert Haas wrote: > I think that we should really consider doing with this patch what Tom > suggested upthread; namely, looking for a mechanism to allow > individual datatypes to offer up a comparator function that doesn't > require bouncing through FunctionCall2Coll(). It seems to me that > duplicating the entire qsort() algorithm is a little iffy. Sure, in > this case it works out to a win. But it's only a small win - > three-quarters of it is in the uncontroversial activity of reducing > the impedance mismatch - and how do we know it will always be a win? > Adding more copies of the same code can be an anti-optimization if it > means that a smaller percentage of it fits in the instruction cache, > and sometimes small changes in runtime are caused by random shifts in > the layout of memory that align things more or less favorably across > cache lines rather than by real effects. Now it may well be that this > is a real effect, but will it still look as good when we do this for > 10 data types? For 100 data types? > > In contrast, it seems to me that reducing the impedance mismatch is > something that we could go and do across our entire code base, and > every single data type would benefit from it. It would also be > potentially usable by other sorting algorithms, not just quick sort. I don't think its credible to implement that kind of generic improvement at this stage of the release cycle. That has a much bigger impact since it potentially effects all internal datatypes and external ones also. Definitely a longer term way forward. If individual datatypes offer up a comparator function that is easily going to result in more code than is being suggested here. So the argument about flooding the CPU cache works against your alternate proposal, not in favour of it. We have no proof that we need to do this for 10 or 100 data types. We only currently have proof that there is gain for the most common types. Of course, it sounds like it might be useful to allow any data type to gain an advantage, but we shouldn't be blind to the point that almost nobody will use such a facility, and if they do the code won't be written for a long time yet. If this came as a request from custom datatype authors complaining of slow sorts it would be different, but it didn't so we don't even know if anybody would ever write user defined comparator routines. Rejecting a patch because of a guessed user requirement is not good. Peter's suggested change adds very few lines of code and those compile to some very terse code, a few hundred instructions at very most. Requesting an extra few cachelines to improve qsort by so much is still an easy overall win. The OP change improves qsort dramatically, and is a small, isolated patch. There is no significant downside. We also have it now, so lets commit this, chalk up another very good performance improvement and use our time on something else this commitfest, such as the flexlocks idea. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] range_adjacent and discrete ranges
While thinking about range_cmp_bounds, I started to think that the way range_adjacent works is wrong. range_adjacent() depends on the bounds of two ranges to match up, such that the boundary values are equal, but one is exclusive and the other inclusive, and one is a lower bound and the other an upper bound. That makes perfect sense for continuous ranges because that is the only way they can possibly be adjacent. It also works for the discrete ranges as defined so far, because they use a canonical function that translates the values to [) form. But if someone were to write a canonical function that translates the ranges to [] or () form, range_adjacent would be useless. There are a few approaches to solving it: 1. Make the canonical function accept a "format" argument much like the constructor, and have an output format stored in the catalog that would be passed in under most circumstances. However, range_adjacent could pass in its own form that would be useful for its purposes. 2. Replace the canonical function with "inc" and "dec" functions, or some variation thereof. We'd still need some kind of output format to match the current behavior, otherwise every range would always be output in [) form (I don't necessarily object to that, but it would be a behavior difference for user-defined range types). 3. Remove range_adjacent. 4. Do nothing, and document that the canonical function should translate to either [) or (] if you want range_adjacent to work. Thoughts? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Are range_before and range_after commutator operators?
On Thu, 2011-11-17 at 17:10 -0500, Tom Lane wrote: > Applied, thanks. These comments aren't quite what I'd hoped for though. > What I'm lacking is the conceptual context, ie, why is a > less-equal-greater primitive for bounds a good thing? It seems like > when you consider the four possible directional (lower/upper) > combinations, the same result from range_cmp_bounds means something > different in each case, and I find that confusing. I wonder whether > functions defined along set-theoretic lines (this bound is strictly > weaker or stronger than this other one, or these bounds together define > an empty or singleton or non-singleton set) might be more transparent. I think it comes down to how helpful it is to define higher-level functions in terms of range_cmp_bounds versus some other function (or set of functions). range_cmp_bounds seemed to work out fairly well for most of the operators, and it was the best that I came up with. The nice thing about it is that it can compare a lower bound to another lower bound, or to an upper bound. Then again, perhaps I tried to hard to unify those concepts, and it just led to complexity? I'm open to suggestion. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers