date:20111118

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Andres Freund

On Saturday, November 19, 2011 12:18:07 AM Kevin Grittner wrote:
> Andres Freund  wrote:
> > I think opannotate -a -s produces output with instructions/code
> > intermingled.
> 
> Thanks.  I'll check out perf later (thanks for the tips!), but for
> now, here's the function which was at the top of my oprofile
> results, annotated with those options.  I'm afraid it's a bit
> intimidating to me -- the last time I did much with X86 assembly
> language was in the mid-80s, on an 80286.  :-/ 
While my assembly knoweldge surely isn't from the 80s be assured that I find it 
intimidating as well ;)

> Hopefully, since
> this is at the top of the oprofile results when running with
> prepared statements, it will be of use to somebody.
I think in quite many situations hash_search_with_hash_value is rather 
noticeable in the profiles. Even without concurrency...

Looking at your annotation output the code seems to be almost entirely stalled 
waiting for memory.
The first stall is after the first reading memory access which is likely to be 
uncached (the first cacheline of the HTAB is accessed before but that will be 
in the cache). The interesting thing is that I would have expected a higher 
likelihood for this to stay in the cache.
  2225  0.0165 :  70b543:   mov(%rdi),%r15
   :static inline uint32
   :calc_bucket(HASHHDR *hctl, uint32 hash_val)
   :{
   :uint32  bucket;
   :
   :bucket = hash_val & hctl->high_mask;
  4544  0.0337 :  70b546:   and0x2c(%r15),%ebx
   :if (bucket > hctl->max_bucket)
 53409  0.3958 :  70b54a:   cmp0x28(%r15),%ebx
   :  70b54e:   jbe70b554 

So a stall here is not that surprising.

Here we fetch data from memory which is unlikely to be prefetchable and then 
require the result from that fetch. Note how  segp = hashp->dir[segment_num]; 
is distributed over line 52, 64, 83.

   :segp = hashp->dir[segment_num];
  2062  0.0153 :  70b562:   shr%cl,%eax
   309  0.0023 :  70b564:   mov%eax,%eax
   643  0.0048 :  70b566:   mov(%rdx,%rax,8),%rbp
   :
   :if (segp == NULL)
 43329  0.3211 :  70b56a:   test   %rbp,%rbp

The next cacheline is referenced here. Again a fetch from memory which is soon 
after needed to continue.
Unless I misunderstood the code-flow this disproves my theory that we might 
have many collisions as that test seems to be outside the test (
   :prevBucketPtr = &segp[segment_ndx];
   :currBucket = *prevBucketPtr;
   122 9.0e-04 :  70b586:   mov0x0(%rbp),%rbx
   :
   :/*
   : * Follow collision chain looking for matching key
   : */
   :match = hashp->match;   /* save one fetch in 
inner 
loop */
   :keysize = hashp->keysize;   /* ditto */
 99903  0.7404 :  70b58a:   mov%rax,0x18(%rsp)
   :
   :while (currBucket != NULL)
  1066  0.0079 :  70b58f:   test   %rbx,%rbx

line 136 is the first time the contents of the current bucket is needed. Thats 
why the test is so noticeable.
   :currBucket = *prevBucketPtr;
   655  0.0049 :  70b5a3:   mov(%rbx),%rbx
   : * Follow collision chain looking for matching key
   : */
   :match = hashp->match;   /* save one fetch in 
inner 
loop */
   :keysize = hashp->keysize;   /* ditto */
   :
   :while (currBucket != NULL)
   608  0.0045 :  70b5a6:   test   %rbx,%rbx
   :  70b5a9:   je 70b5d0 
   :{
   :if (currBucket->hashvalue == hashvalue &&
  3504  0.0260 :  70b5ab:   cmp%r12d,0x8(%rbx)
 98486  0.7299 :  70b5af:   nop
  1233  0.0091 :  70b5b0:   jne70b5a0 

That covers all the slow points in the function. And unless I am missing 
something those are all the fetched cachelines of that function... For 
HASH_FIND that is.

So I think that reinforces my belive that ordinary cachemisses are the culprit 
here. Which is to be excepted in a hashtable...

Andres

PS: No idea whether that rambling made sense to anyone... But I looked at that 
function fo  the first time ;)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inlining comparators as a performance optimisation

2011-11-18 Thread Robert Haas

On Fri, Nov 18, 2011 at 4:38 PM, Peter Geoghegan  wrote:
> I understand that we highly value extensibility and genericity (yes,
> that's a real word). We may not always be well served by that
> tendency.

True (except that genericity is not a synonym for generality AFAICT).
A good fraction of the optimizations that we've done over the last,
well, pick any arbitrary period of time consists in adding special
cases that occur frequently enough to be worth optimizing - e.g. HOT,
or fast relation locks - often to extremely good effect.

> Firstly, 1/4 of a quite large gain is still a pretty good gain.
> Secondly, I probably didn't actually isolate the effects of inlining,
> nor the overall benefit of the compiler knowing the comparator at
> compile-time. I just removed the inline keyword.

Maybe we should look at trying to isolate that a bit better.

It strikes me that we could probably create an API that would support
doing either of these things depending on the wishes of the underlying
datatype.  For example, imagine that we're sorting with <(int4, int4).
 We associate a PGPROC-callable function with that operator that
returns "internal", really a pointer to a struct.  The first element
of the struct is a pointer to a comparison function that qsort() (or a
tape sort) can invoke without a trampoline; the second is a wholesale
replacement for qsort(); either or both can be NULL.  Given that, it
seems to me that we could experiment with this pretty easily, and if
it turns out that only one of them is worth doing, it's easy to drop
one element out of the structure.

Or do you have another plan for how to do this?

> Fair enough, but it's not the only test I did - I posted other numbers
> for the same query when the table was 48mb, and we saw a proportional
> improvement, consistent with a per-comparison win. I'm supposed to be
> on leave for a few days at the moment, so I won't be very active this
> weekend, but I'm rather curious as to where you or others would like
> to see me go with benchmarks.
>
> I should point out that we currently don't have much idea how big of a
> win applying these principles could be for index creation times...it
> could possibly be very significant. My profiling of index creation
> makes this looks promising.

Have you done any benchmarks where this saves seconds or minutes,
rather than milliseconds?  That would certainly make it more exciting,
at least to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXPLAIN (plan off, rewrite off) for benchmarking

2011-11-18 Thread Andres Freund

On Saturday, November 19, 2011 12:16:18 AM Tom Lane wrote:
> Andres Freund  writes:
> > Hi,
> > For benchmarking the parser I added the above options (dim suggested this
> > on irc) which proved to be rather useful for me.
> 
> What exactly is EXPLAIN printing, if you've not done planning? 
Nothing very interesting:

postgres=# EXPLAIN (rewrite off) SELECT 1, 'test', "pg_class"."oid",relname, 
relkind FROM pg_class WHERE oid = 1000;
QUERY PLAN
--
 not rewriting query because auf !rewrite
(1 row)

Explain is just a vehicle here, I admit that. But on what else should I bolt 
it?
The only thing I could think of otherwise would be to do the parsing via from 
a C func. But to simmulate a real scenario there would require too much 
bootstrapping for my taste.

> Also, I
> believe the planner depends on the assumption that the rewriter has done
> its work, so these seem to amount to EXPLAIN (break_it).
"rewrite off" currently simply aborts before doing the rewriting and 
copyObject(). copyObject is the expensive part there for simple queries.

rewriting happened to be the functional part where I wanted to stop - because 
of the overhead of copyObject - instead of a goal in itself.

Btw, optimizing copyObject() memory usage wise would be another big 
performance gain. But even murkier than the stuff over in the lists thread...

> If you just want to benchmark parsing, perhaps CREATE RULE would be a
> useful environment.
I don't really see how one can use that to benchmark parsing. CREATE OR 
REPLACE VIEW is - not unexpectedly - far slower than EXPLAIN (rewrite off) or 
EXPLAIN (plan off).

for the statement:

SELECT * FROM pg_class WHERE oid = 1000;

rewrite off:
tps = 16086.694353
plan off, no copyObject()
tps = 15663.280093
plan off:
tps = 13471.272551
explain:
tps = 6162.161946
explain analyze:
tps = 5744.172839
normal execution:
tps = 6991.398740
CREATE OR REPLACE VIEW (after changing the log level):
tps = 2550.246625


Greetings,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Robert Haas

On Fri, Nov 18, 2011 at 6:46 PM, Kevin Grittner
 wrote:
>>> tps = 21946.961196 (including connections establishing)
>>> tps = 22911.873227 (including connections establishing)
>>>
>>> For write transactions, that seems pretty respectable.
>>
>> Very.  What do you get without the patch?
>
> [quick runs a couple tests that way]
>
> Single run with -M simple:
>
> tps = 23018.314292 (including connections establishing)
>
> Single run with -M prepared:
>
> tps = 27910.621044 (including connections establishing)
>
> So, the patch appears to hinder performance in this environment,
> although certainty is quite low with so few samples.  I'll schedule
> a spectrum of runs before I leave this evening (very soon).

Hmm.  There's obviously something that's different in your environment
or configuration from what I tested, but I don't know what it is.  The
fact that your scale factor is larger than shared_buffers might
matter; or Intel vs. AMD.  Or maybe you're running with
synchronous_commit=on?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Kevin Grittner

Robert Haas  wrote: 

> Hmm.  That looks a lot like a profile with no lock contention at
> all.  Since I see XLogInsert in there, I assume this must be a
> pgbench write test on unlogged tables?  How close am I?

Not unless pgbench on HEAD does that by default.  Here are the
relevant statements:

$prefix/bin/pgbench -i -s 150
$prefix/bin/pgbench -T $time -c $clients -j $clients >>$resultfile

Perhaps the Intel cores implement the relevant primitives better? 
Maybe I didn't run the profile or reports the right way?

> I was actually thinking it would be interesting to oprofile the
> read-only test; see if we can figure out where those slowdowns are
> coming from.

I'll plan on doing that this weekend.

>> tps = 21946.961196 (including connections establishing)
>> tps = 22911.873227 (including connections establishing)
>>
>> For write transactions, that seems pretty respectable.
> 
> Very.  What do you get without the patch?

[quick runs a couple tests that way]

Single run with -M simple:

tps = 23018.314292 (including connections establishing)

Single run with -M prepared:

tps = 27910.621044 (including connections establishing)

So, the patch appears to hinder performance in this environment,
although certainty is quite low with so few samples.  I'll schedule
a spectrum of runs before I leave this evening (very soon).

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Robert Haas

On Fri, Nov 18, 2011 at 2:05 PM, Kevin Grittner
 wrote:
> Robert Haas  wrote:
>> Any chance you can run oprofile (on either branch, don't really
>> care) against the 32 client test and post the results?
>
> [ oprofile results ]

Hmm.  That looks a lot like a profile with no lock contention at all.
Since I see XLogInsert in there, I assume this must be a pgbench write
test on unlogged tables?  How close am I?

I was actually thinking it would be interesting to oprofile the
read-only test; see if we can figure out where those slowdowns are
coming from.

> Two runs:
>
> tps = 21946.961196 (including connections establishing)
> tps = 22911.873227 (including connections establishing)
>
> For write transactions, that seems pretty respectable.

Very.  What do you get without the patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Kevin Grittner

Andres Freund  wrote:
 
> I think opannotate -a -s produces output with instructions/code
> intermingled.
 
Thanks.  I'll check out perf later (thanks for the tips!), but for
now, here's the function which was at the top of my oprofile
results, annotated with those options.  I'm afraid it's a bit
intimidating to me -- the last time I did much with X86 assembly
language was in the mid-80s, on an 80286.  :-/  Hopefully, since
this is at the top of the oprofile results when running with
prepared statements, it will be of use to somebody.
 
The instructions which are shown as having that 1% still seem odd to
me, but as you say, they were probably actually waiting for some
previous operation to finish:
 
 43329  0.3211 :  70b56a:   test   %rbp,%rbp
 
 99903  0.7404 :  70b58a:   mov%rax,0x18(%rsp)
 
If anyone wants any other detail from what I captured, let me know.
 
-Kevin

0070b520 : /* hash_search_with_hash_value 
total: 495463  3.6718 */
   :hash_search_with_hash_value(HTAB *hashp,
   :const 
void *keyPtr,
   :uint32 
hashvalue,
   :
HASHACTION action,
   :bool 
*foundPtr)
   :{
  5023  0.0372 :  70b520:   push   %r15
  5967  0.0442 :  70b522:   push   %r14
  1407  0.0104 :  70b524:   mov%rdi,%r14
30 2.2e-04 :  70b527:   push   %r13
  2495  0.0185 :  70b529:   push   %r12
  2631  0.0195 :  70b52b:   mov%edx,%r12d
18 1.3e-04 :  70b52e:   push   %rbp
  1277  0.0095 :  70b52f:   push   %rbx
   :static inline uint32
   :calc_bucket(HASHHDR *hctl, uint32 hash_val)
   :{
   :uint32  bucket;
   :
   :bucket = hash_val & hctl->high_mask;
  2122  0.0157 :  70b530:   mov%edx,%ebx
   :hash_search_with_hash_value(HTAB *hashp,
   :const 
void *keyPtr,
   :uint32 
hashvalue,
   :
HASHACTION action,
   :bool 
*foundPtr)
   :{
   247  0.0018 :  70b532:   sub$0x58,%rsp
   236  0.0017 :  70b536:   mov%rsi,0x10(%rsp)
  3851  0.0285 :  70b53b:   mov%ecx,0xc(%rsp)
  2551  0.0189 :  70b53f:   mov%r8,(%rsp)
   :HASHHDR*hctl = hashp->hctl;
  2225  0.0165 :  70b543:   mov(%rdi),%r15
   :static inline uint32
   :calc_bucket(HASHHDR *hctl, uint32 hash_val)
   :{
   :uint32  bucket;
   :
   :bucket = hash_val & hctl->high_mask;
  4544  0.0337 :  70b546:   and0x2c(%r15),%ebx
   :if (bucket > hctl->max_bucket)
 53409  0.3958 :  70b54a:   cmp0x28(%r15),%ebx
   :  70b54e:   jbe70b554 
   :bucket = bucket & hctl->low_mask;
  3324  0.0246 :  70b550:   and0x30(%r15),%ebx
   :bucket = calc_bucket(hctl, hashvalue);
   :
   :segment_num = bucket >> hashp->sshift;
   :segment_ndx = MOD(bucket, hashp->ssize);
   :
   :segp = hashp->dir[segment_num];
  9702  0.0719 :  70b554:   mov0x58(%r14),%ecx
  2428  0.0180 :  70b558:   mov%ebx,%eax
   489  0.0036 :  70b55a:   mov0x8(%r14),%rdx
   : * Do the initial lookup
   : */
   :bucket = calc_bucket(hctl, hashvalue);
   :
   :segment_num = bucket >> hashp->sshift;
   :segment_ndx = MOD(bucket, hashp->ssize);
   391  0.0029 :  70b55e:   mov0x50(%r14),%r13
   :
   :segp = hashp->dir[segment_num];
  2062  0.0153 :  70b562:   shr%cl,%eax
   309  0.0023 :  70b564:   mov%eax,%eax
   643  0.0048 :  70b566:   mov(%rdx,%rax,8),%rbp
   :
   :if (segp == NULL)
 43329  0.3211 :  70b56a:   test   %rbp,%rbp
  1284  0.0095 :  70b56d:   je 70b727 

   :hash_corrupted(hashp);
   :
   :prevBucketPtr = &segp[segment_ndx];
  1878  0.0139 :  70b573:   lea-0x1(%r13),%rax
   :currBucket = *prevBucketPtr;
   :
   :/*
   : * Follow collision chain looking for matching key
   : */
   :match = hashp->match;

Re: [HACKERS] EXPLAIN (plan off, rewrite off) for benchmarking

2011-11-18 Thread Tom Lane

Andres Freund  writes:
> Hi,
> For benchmarking the parser I added the above options (dim suggested this on 
> irc) which proved to be rather useful for me.

What exactly is EXPLAIN printing, if you've not done planning?  Also, I
believe the planner depends on the assumption that the rewriter has done
its work, so these seem to amount to EXPLAIN (break_it).

If you just want to benchmark parsing, perhaps CREATE RULE would be a
useful environment.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should a materialized view be based on a view?

2011-11-18 Thread Thom Brown

On 18 November 2011 22:26, Kevin Grittner  wrote:
> I still have a lot of reading to do before I propose anything
> concrete for development, but one thing that has already struck me
> as a common theme for MVs is that a lot of people seem to like the
> idea of first creating a "normal" view, and then materializing it.
> That seems pretty attractive to me, too.  How do people feel about
> that as a fundamental design decision: that a MV would always have
> a corresponding view (under a different name or in a different
> schema)?  Love it or hate it?

Is there a need to create it as a normal view first?  Can't the CREATE
VIEW syntax be expanded to support MV capabilities? (CREATE [
MATERIALIZED ] VIEW...) And then ALTER VIEW can materialise a regular
view, or dematerialise a materialised view.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should a materialized view be based on a view?

2011-11-18 Thread Szymon Guz

On 18 November 2011 23:26, Kevin Grittner wrote:

> I still have a lot of reading to do before I propose anything
> concrete for development, but one thing that has already struck me
> as a common theme for MVs is that a lot of people seem to like the
> idea of first creating a "normal" view, and then materializing it.
> That seems pretty attractive to me, too.  How do people feel about
> that as a fundamental design decision: that a MV would always have
> a corresponding view (under a different name or in a different
> schema)?  Love it or hate it?
>
> -Kevin
>
>

Hi Kevin,
maybe a stupid question... but why? It looks like for creating a function I
should create another function earlier. For me the design should be simple.
If you want to create something below my MV, thats fine for me, if I don't
need to know that (just like when creating a serial column).

regards
Szymon

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Andres Freund

On Friday, November 18, 2011 11:12:02 PM Andres Freund wrote:
> On Friday, November 18, 2011 09:16:01 PM Kevin Grittner wrote:
> > Andres Freund  wrote:
> > > When doing line-level profiles I would suggest looking at the
> > > instructions.
> > 
> > What's the best way to do that?
> 
> I think opannotate -a -s produces output with instructions/code
> intermingled.
> 
> > > I don't think cache line contention is the most likely candidate
> > > here.  Simple cache-misses seem far more likely. In combination
> > > with pipeline stalls...
> > > 
> > > Newer cpus (nehalem+) can measure stalled cycles which can be
> > > really useful when analyzing performance. I don't remember how to
> > > do that with oprofile right now though as I use perf these days
> > > (its -e stalled-cycles{frontend|backend} there}).
> > 
> > When I run oprofile, I still always go back to this post by Tom:
> > http://archives.postgresql.org/pgsql-performance/2009-06/msg00154.php
> 
> Hrm. I am on the train and for unknown reasons the only sensible working
> protocols are smtp + pop Waiting Waiting
> Sorry, too slow/high latency atm. I wrote everything below and another mail
> and the page still hasn't loaded.
> 
> oprofile can produces graphes as well (--callgraph). for both tools you
> need -fno-omit-frame-pointers to get usable graphs.
> 
> > Can anyone provide such a "cheat sheet" for perf?  I could give that
> > a try if I knew how.
> 
> Unfortunately for sensible results the kernel needs to be rather new.
> I would say > 2.6.28 or so (just guessed).
> 
> # to record activity
> perf record [-g|--call-graph] program|-p pid
> 
> # to view a summation
> perf report

> # get heaps of stats from something
> perf stat -ddd someprogram|-p pid

> # show whats the system executing overall
> perf top -az
> 
> # get help
> perf help (record|report|annotate|stat|...)
> 
...
I forgot that there is also 

# get a list of event types
perf list

# measure somethign for a specidif event
perf (record|stat|top) -e some_event_type



Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Andres Freund

On Friday, November 18, 2011 09:16:01 PM Kevin Grittner wrote:
> Andres Freund  wrote:
> > When doing line-level profiles I would suggest looking at the
> > instructions.
> What's the best way to do that?
I think opannotate -a -s produces output with instructions/code intermingled.

> > I don't think cache line contention is the most likely candidate
> > here.  Simple cache-misses seem far more likely. In combination
> > with pipeline stalls...
> > 
> > Newer cpus (nehalem+) can measure stalled cycles which can be
> > really useful when analyzing performance. I don't remember how to
> > do that with oprofile right now though as I use perf these days
> > (its -e stalled-cycles{frontend|backend} there}).
> 
> When I run oprofile, I still always go back to this post by Tom:
> http://archives.postgresql.org/pgsql-performance/2009-06/msg00154.php
Hrm. I am on the train and for unknown reasons the only sensible working
protocols are smtp + pop Waiting Waiting
Sorry, too slow/high latency atm. I wrote everything below and another mail
and the page still hasn't loaded.

oprofile can produces graphes as well (--callgraph). for both tools you need
-fno-omit-frame-pointers to get usable graphs.

> Can anyone provide such a "cheat sheet" for perf?  I could give that
> a try if I knew how.
Unfortunately for sensible results the kernel needs to be rather new.
I would say > 2.6.28 or so (just guessed).

# to record activity
perf record [-g|--call-graph] program|-p pid 

# to view a summation
perf report

graph:
# Overhead   Command  Shared Object 
Symbol
#     .  
.
#
 4.09%  postgres  postgres   [.] slab_alloc_dyn
|
--- slab_alloc_dyn
   |  
   |--18.52%-- new_list
   |  |  
   |  |--63.79%-- lappend
   |  |  |  
   |  |  |--13.40%-- find_usable_indexes
   |  |  |  create_index_paths
   |  |  |  set_rel_pathlist
   |  |  |  make_one_rel

flat:

# Overhead   Command  Shared Object 
Symbol
#     .  
.
#
 5.10%  postgres  [vdso] [.] 0x73d8d770  
 4.26%  postgres  postgres   [.] base_yyparse
 3.88%  postgres  postgres   [.] slab_alloc_dyn
 2.82%  postgres  postgres   [.] core_yylex
 2.37%  postgres  postgres   [.] SearchCatCache
 1.85%  postgres  libc-2.13.so   [.] __memcpy_ssse3
 1.66%  postgres  libc-2.13.so   [.] __GI___strcmp_ssse3
 1.23%  postgres  postgres   [.] MemoryContextAlloc


# to view a line/source/instruction level view
perf annotate -l symbol

...
 :
 :  /*
 :   * one-time startup overhead for each cache
 :   */
 :  if (cache->cc_tupdesc == NULL)
0.35 :6e81fd:   48 83 7f 28 00  cmpq   $0x0,0x28(%rdi)
 
/home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1070
4.15 :6e8202:   0f 84 54 04 00 00   je 6e865c 

 :  #endif
 :
 :  /*
 :   * initialize the search key information
 :   */
 :  memcpy(cur_skey, cache->cc_skey, sizeof(cur_skey));
0.00 :6e8208:   48 8d bd a0 fe ff fflea-0x160(%rbp),%rdi
0.17 :6e820f:   49 8d 77 70 lea0x70(%r15),%rsi
0.00 :6e8213:   b9 24 00 00 00  mov$0x24,%ecx
 
/home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1080
   33.22 :6e8218:   f3 48 a5rep movsq 
%ds:(%rsi),%es:(%rdi)
 :  cur_skey[0].sk_argument = v1;
 
/home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1081
1.56 :6e821b:   48 89 9d e0 fe ff ffmov%rbx,-0x120(%rbp)
...

# get heaps of stats from something
perf stat -ddd someprogram|-p pid

   1242.409965 task-clock#0.824 CPUs utilized   
[100.00%]
14,572 context-switches  #0.012 M/sec   
[100.00%]
   264 CPU-migrations#0.000 M/sec   
[100.00%]
 0 page-faults   #0.000 M/sec  
 2,854,775,135 cycles#2.298 GHz 
[26.28%]
stalled-cycles-frontend 
stalled-cycles-backend  
 2,024,997,785 instructions  #0.71  insns per cycle 
[25.25%]
   387,240,903 bran

[HACKERS] Should a materialized view be based on a view?

2011-11-18 Thread Kevin Grittner

I still have a lot of reading to do before I propose anything
concrete for development, but one thing that has already struck me
as a common theme for MVs is that a lot of people seem to like the
idea of first creating a "normal" view, and then materializing it. 
That seems pretty attractive to me, too.  How do people feel about
that as a fundamental design decision: that a MV would always have
a corresponding view (under a different name or in a different
schema)?  Love it or hate it?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inlining comparators as a performance optimisation

2011-11-18 Thread Martijn van Oosterhout

On Fri, Nov 18, 2011 at 12:20:26AM -0500, Robert Haas wrote:
> I think that we should really consider doing with this patch what Tom
> suggested upthread; namely, looking for a mechanism to allow
> individual datatypes to offer up a comparator function that doesn't
> require bouncing through FunctionCall2Coll().  It seems to me that
> duplicating the entire qsort() algorithm is a little iffy.  Sure, in
> this case it works out to a win.  But it's only a small win -
> three-quarters of it is in the uncontroversial activity of reducing
> the impedance mismatch - and how do we know it will always be a win?

There's always the old idea of a data type providing a function mapping
f:(type -> int64) in such a way that it preserves order. That is, in the
sense that:

f(x) < f(y)  =>  x < y

When sorting, you add f(x) as hidden column and change "ORDER BY x" to
"ORDER BY f(x), x".  Then you only need to special case the int64
version.  This would mean that in most cases you may be able to skip
the call because you're comparing integers.  The downside is you need
to call f on each input.  It depends on the datatype if that's cheaper
or not, but for all numerics types I think it's an easy win.

I don't think anyone has written a proof of concept for this. It does
have the advantage of scaling better than coding a qsort for each
individual type.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

signature.asc
Description: Digital signature

Re: [HACKERS] Materialized views

2011-11-18 Thread Kevin Grittner

"Kevin Grittner"  wrote:

> I'm considering submitting a proposal to management that I be
> assigned to work on a declarative implementation in PostgreSQL to
> allow speedier application development of software needing
> materialized views.

Thanks to all who provided feedback and support in response to my
post.

Based on the feedback here and off-list, I did submit a proposal. 
It was just approved by the appropriate steering committee
(consisting of our CIO, the Director of State Courts, District Court
Administrators, Judges, Clerks of Court, and other stake-holders) as
a low-priority project.  That means that I expect I'll have the time
to get a patch together in time for 9.3, but the times at which the
decks will be clear of other assignments to allow work on this will
not be very predictable.  I'll probably be on-again, off-again
throughout the year.  I apologize in advance for the fact that the
times when I will be able to work on it might not fit well with the
release cycle or CFs, but I kinda have to take what I can get in
that regard.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Replace a long chain of if's in eval_const_expressions_mutator by a switch()

2011-11-18 Thread Andres Freund

On Friday, November 18, 2011 10:14:22 PM Tom Lane wrote:
> Andres Freund  writes:
> > For unknown reasons the function used non chained ifs for every handled
> > nodeType.
> > Replacing the if chain with if; else if; ... resulted in a small
> > speedup. Replacing it with a switch() in a bigger one.
> 
> Cool, but this patch is impossible to validate by eye.  Could you
> resubmit a version that doesn't reindent unchanged code?  Leave it
> for pgindent to clean that up later.
Sure. It was just to confusing reading the code without reindenting.

Btw, I found git diff/show/blame -w very useful to view reindent code.

Actually git show -w seems to produce an applyable patch which doesn't 
reindent...

Andres
commit e114e3a2708cab8efd64281d09cecbd6303aa329
Author: Andres Freund 
Date:   Fri Nov 11 23:32:02 2011 +0100

Replace a long chain of if's in eval_const_expressions_mutator by a switch()

For unknown reasons the function used non chained ifs for every handled
nodeType.

Replacing the if chain with if; else if; ... resulted in a small
speedup. Replacing it with a switch() in a bigger one.

When testing with a statement containing a longer VALUES statement:
pgbench -M prepared -f stmt -T 10:
orig: 10015.287829
if: 10075.482117
switch: 10246.527402

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 197d9c2..f6f0b11 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2106,7 +2106,9 @@ eval_const_expressions_mutator(Node *node,
 {
 	if (node == NULL)
 		return NULL;
-	if (IsA(node, Param))
+	switch(nodeTag(node))
+	{
+		case T_Param:
 	{
 		Param	   *param = (Param *) node;
 
@@ -2152,7 +2154,7 @@ eval_const_expressions_mutator(Node *node,
 		/* Not replaceable, so just copy the Param (no need to recurse) */
 		return (Node *) copyObject(param);
 	}
-	if (IsA(node, FuncExpr))
+		case T_FuncExpr:
 	{
 		FuncExpr   *expr = (FuncExpr *) node;
 		List	   *args;
@@ -2210,7 +2212,7 @@ eval_const_expressions_mutator(Node *node,
 		newexpr->location = expr->location;
 		return (Node *) newexpr;
 	}
-	if (IsA(node, OpExpr))
+		case T_OpExpr:
 	{
 		OpExpr	   *expr = (OpExpr *) node;
 		List	   *args;
@@ -2275,7 +2277,7 @@ eval_const_expressions_mutator(Node *node,
 		newexpr->location = expr->location;
 		return (Node *) newexpr;
 	}
-	if (IsA(node, DistinctExpr))
+		case T_DistinctExpr:
 	{
 		DistinctExpr *expr = (DistinctExpr *) node;
 		List	   *args;
@@ -2372,7 +2374,7 @@ eval_const_expressions_mutator(Node *node,
 		newexpr->location = expr->location;
 		return (Node *) newexpr;
 	}
-	if (IsA(node, BoolExpr))
+		case T_BoolExpr:
 	{
 		BoolExpr   *expr = (BoolExpr *) node;
 
@@ -2440,9 +2442,8 @@ eval_const_expressions_mutator(Node *node,
 break;
 		}
 	}
-	if (IsA(node, SubPlan) ||
-		IsA(node, AlternativeSubPlan))
-	{
+		case T_SubPlan:
+		case T_AlternativeSubPlan:
 		/*
 		 * Return a SubPlan unchanged --- too late to do anything with it.
 		 *
@@ -2450,8 +2451,7 @@ eval_const_expressions_mutator(Node *node,
 		 * never be invoked after SubPlan creation.
 		 */
 		return node;
-	}
-	if (IsA(node, RelabelType))
+		case T_RelabelType:
 	{
 		/*
 		 * If we can simplify the input to a constant, then we don't need the
@@ -2493,7 +2493,7 @@ eval_const_expressions_mutator(Node *node,
 			return (Node *) newrelabel;
 		}
 	}
-	if (IsA(node, CoerceViaIO))
+		case T_CoerceViaIO:
 	{
 		CoerceViaIO *expr = (CoerceViaIO *) node;
 		Expr	   *arg;
@@ -2569,7 +2569,7 @@ eval_const_expressions_mutator(Node *node,
 		newexpr->location = expr->location;
 		return (Node *) newexpr;
 	}
-	if (IsA(node, ArrayCoerceExpr))
+		case T_ArrayCoerceExpr:
 	{
 		ArrayCoerceExpr *expr = (ArrayCoerceExpr *) node;
 		Expr	   *arg;
@@ -2607,7 +2607,7 @@ eval_const_expressions_mutator(Node *node,
 		/* Else we must return the partially-simplified node */
 		return (Node *) newexpr;
 	}
-	if (IsA(node, CollateExpr))
+	case T_CollateExpr:
 	{
 		/*
 		 * If we can simplify the input to a constant, then we don't need the
@@ -2652,7 +2652,7 @@ eval_const_expressions_mutator(Node *node,
 			return (Node *) relabel;
 		}
 	}
-	if (IsA(node, CaseExpr))
+	case T_CaseExpr:
 	{
 		/*--
 		 * CASE expressions can be simplified if there are constant
@@ -2783,7 +2783,7 @@ eval_const_expressions_mutator(Node *node,
 		newcase->location = caseexpr->location;
 		return (Node *) newcase;
 	}
-	if (IsA(node, CaseTestExpr))
+		case T_CaseTestExpr:
 	{
 		/*
 		 * If we know a constant test value for the current CASE construct,
@@ -2795,7 +2795,7 @@ eval_const_expressions_mutator(Node *node,
 		else
 			return copyObject(node);
 	}
-	if (IsA(node, ArrayExpr))
+		case T_ArrayExpr:
 	{
 		ArrayExpr  *arrayexpr = (ArrayExpr *) node;
 		ArrayExpr  *newarray;
@@ -2831,7 +2831,7 @@ eval_const_expressions_mutator(Node *node,
 
 		return (Node *) newarray;
 	}
-	if (IsA(node, CoalesceExpr))
+		case T_CoalesceExpr:
 	{

Re: [HACKERS] RFC: list API / memory allocations

2011-11-18 Thread Andres Freund

On Friday, November 18, 2011 10:11:29 PM Tom Lane wrote:
> Andres Freund  writes:
> > In many scenarios memory allocation is one of the top 3 functions showing
> > up in profiles. Looking at hierarchical profiles
> > (-fno-omit-frame-pointer) at least during parsing, planning and executor
> > startup most of that is spent around the list API.
> > 
> > Many - especially in the parse-analyze phase  - of those allocations can
> > be avoided because the lists are immutable and their size is known
> > upfront.
> 
> The fundamental problem with all of those proposals is that now you have
> some lists in the system that aren't like other lists, and will result
> in dumping core if the wrong sorts of operations are applied to them.
> I don't particularly care for introducing that kind of fragility into
> the system in return for marginal speed gains.  I'm not impressed by
> Asserts showing that no such thing happens in the cases you tested;
> the test coverage won't be complete, and even if it is, innocent-looking
> code changes later on could create new problems.
Yes. I dislike that myself (as noted). It seems rather fragile although I 
think at least during parsing we could simply generally refrain from parsing

I don't think that the gains are marginal though. After covering only a small 
number of cases there are not uncommon/artificial cases gaining more than 20% 
parsing speed.
One prime example of workloads benefiting hugely is something like SELECT * 
FROM pg_class WHERE oid = ...;
Essentially all queries which request few rows with a large number of columns 
benefit rather measurably. 

> Now, if you could do something that *doesn't* restrict what operations
> could be applied to the lists, that would be good.
If every list cell/header would grow another field "allocate_only" (which I 
currently added for cassert only) those could just get skipped when deleting.
Some places are directly freeing list headers but that seems to be a bad idea 
anyway.
The disadvantage is that those would essentially be there till context reset 
unless done via some special api. Also not that nice... On the other hand only 
very few callsites free list(-cells) during parsing anyway.
Without looking I didn't see measurable increase in memory usage due to the 
new field with that approach.

The only way out of that seems to be to add refcounted list cells/headers :(. 
I fear that would be rather complex, expensive and failure prone so I don't 
like to go there.

> I've wished for a long while that we could allocate the list header and
> the first list cell in a single palloc cycle. 
Yea. Although that is only a rather small portion of the problems/allocations.

> This would basically
> require getting list_delete_cell to refrain from pfree'ing a cell that
> got allocated that way, which is easy as long as you have the list
> header at hand, but what happens if the list is later concat'd to
> another?  A subsequent delete operation would be referring to the other
> list header and would come to the wrong conclusion.
I don't think any such scheme is safe.

> While thinking about this just now, it occurred to me that maybe the
> issues could be dodged if the cell, not the header, were first in the
> combined palloc block.  list_concat is then no longer a problem, as long
> as it doesn't try to throw away the second list's header.  But I haven't
> thought long enough to be sure how well that would work.
I don't think that would work without carefully revising list usage all 
around... Several places remove nodes from a list and then do list_free() on 
the remainder.


Something aside:
For my POC memory allocator I added "intrusive" lists which have the next, 
prev elements embedded in the stored element. I wonder if some of the list 
usage could be replaced by such a scheme. Obviously for every embeded 
list_node a Node can only be in one list...


Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inlining comparators as a performance optimisation

2011-11-18 Thread Peter Geoghegan

On 18 November 2011 05:20, Robert Haas  wrote:
> I think that we should really consider doing with this patch what Tom
> suggested upthread; namely, looking for a mechanism to allow
> individual datatypes to offer up a comparator function that doesn't
> require bouncing through FunctionCall2Coll().  It seems to me that
> duplicating the entire qsort() algorithm is a little iffy.

I understand that we highly value extensibility and genericity (yes,
that's a real word). We may not always be well served by that
tendency.

> Sure, in this case it works out to a win.  But it's only a small win -
> three-quarters of it is in the uncontroversial activity of reducing
> the impedance mismatch - and how do we know it will always be a win?

Firstly, 1/4 of a quite large gain is still a pretty good gain.
Secondly, I probably didn't actually isolate the effects of inlining,
nor the overall benefit of the compiler knowing the comparator at
compile-time. I just removed the inline keyword. Those are two
different things. The inline keyword serves as a request to the
compiler to inline. The compiler can and often will ignore that
request. Most people know that. What isn't so widely known is that
modern compilers may equally well inline even when they haven't been
asked to (but only when they can). When you also consider, as I've
already pointed out several times, that individual call sites are
inlined, it becomes apparent that there may well still be a certain
amount of inlining and/or other optimisations like procedural
integration going on at some call sites even without the encouragement
of the inline keyword, that would not have been performed without the
benefit of compile-time comparator knowledge. The addition of the
inline keyword may just, in this particular case, have the compiler
inline even more call sites.  Posting the ~8ms difference was
motivated by a desire to prove that inlining had *some* role to play,
without actually going to the trouble of implementing Tom's idea as a
basis of comparison, because Tom was very sceptical of inlining.

The long and the short of it is that I'm going to have to get my hands
dirty with a dissembler before we really know exactly what's
happening. That, or I could use an optimisation fence of some type.

> Adding more copies of the same code can be an anti-optimization if it
> means that a smaller percentage of it fits in the instruction cache,
> and sometimes small changes in runtime are caused by random shifts in
> the layout of memory that align things more or less favorably across
> cache lines rather than by real effects.  Now it may well be that this
> is a real effect, but will it still look as good when we do this for
> 10 data types?  For 100 data types?

I'd favour limiting it to just the common integer and float types.

> In contrast, it seems to me that reducing the impedance mismatch is
> something that we could go and do across our entire code base, and
> every single data type would benefit from it.  It would also be
> potentially usable by other sorting algorithms, not just quick sort.

Suppose that we went ahead and added that infrastructure. What you
must acknowledge is that one reason that this speed-up is so dramatic
is that the essential expense of a comparison is already so low - a
single instruction - and therefore the overall per-comparison cost
goes way down, particularly if the qsort inner loop can store the code
across fewer cache lines. For that reason, any absolute improvement
that you'll see in complex datatypes will be smaller, maybe much
smaller, because for each comparison we'll execute many more
instructions that are essential to the comparison. In my estimation,
all of this work does not point to there being an undue overhead in
the function calling convention as you suggested. Still, I'm not
opposed to investigating generalising this in some way, reservations
notwithstanding, unless we have to block-wait on it. I don't want to
chase diminishing returns too far.

> Well, that's kind of my point.  I think this needs more work before we
> decide what the best approach is.

Agreed.

> So far, the ONLY test result we
> have that compares inlining to not inlining shows a speedup from 60 ms
> to 52 ms.  I think that an 8 ms speedup on one test with one datatype
> on one platform/compiler combination isn't sufficient evidence to
> conclude that this is the best possible approach.

Fair enough, but it's not the only test I did - I posted other numbers
for the same query when the table was 48mb, and we saw a proportional
improvement, consistent with a per-comparison win. I'm supposed to be
on leave for a few days at the moment, so I won't be very active this
weekend, but I'm rather curious as to where you or others would like
to see me go with benchmarks.

I should point out that we currently don't have much idea how big of a
win applying these principles could be for index creation times...it
could possibly be very significant. My profiling of index

Re: [HACKERS] [PATCH] Replace a long chain of if's in eval_const_expressions_mutator by a switch()

2011-11-18 Thread Tom Lane

Andres Freund  writes:
> For unknown reasons the function used non chained ifs for every handled
> nodeType.
> Replacing the if chain with if; else if; ... resulted in a small
> speedup. Replacing it with a switch() in a bigger one.

Cool, but this patch is impossible to validate by eye.  Could you
resubmit a version that doesn't reindent unchanged code?  Leave it
for pgindent to clean that up later.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] EXPLAIN (plan off, rewrite off) for benchmarking

2011-11-18 Thread Andres Freund

Hi,

For benchmarking the parser I added the above options (dim suggested this on 
irc) which proved to be rather useful for me.
I added the additional rewrite option because the overhead of copying the tree 
around makes the profile significantly less expressive.
I would also like an option which would only do the actual parsing instead of 
parse + parse analyse but that seemed a tad more complicated...

Is anybody else interested in such EXPLAIN options?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: list API / memory allocations

2011-11-18 Thread Tom Lane

Andres Freund  writes:
> In many scenarios memory allocation is one of the top 3 functions showing up 
> in profiles. Looking at hierarchical profiles (-fno-omit-frame-pointer) at 
> least 
> during parsing, planning and executor startup most of that is spent around 
> the 
> list API.

> Many - especially in the parse-analyze phase  - of those allocations can be 
> avoided because the lists are immutable and their size is known upfront.

The fundamental problem with all of those proposals is that now you have
some lists in the system that aren't like other lists, and will result
in dumping core if the wrong sorts of operations are applied to them.
I don't particularly care for introducing that kind of fragility into
the system in return for marginal speed gains.  I'm not impressed by
Asserts showing that no such thing happens in the cases you tested;
the test coverage won't be complete, and even if it is, innocent-looking
code changes later on could create new problems.

Now, if you could do something that *doesn't* restrict what operations
could be applied to the lists, that would be good.

I've wished for a long while that we could allocate the list header and
the first list cell in a single palloc cycle.  This would basically
require getting list_delete_cell to refrain from pfree'ing a cell that
got allocated that way, which is easy as long as you have the list
header at hand, but what happens if the list is later concat'd to
another?  A subsequent delete operation would be referring to the other
list header and would come to the wrong conclusion.

While thinking about this just now, it occurred to me that maybe the
issues could be dodged if the cell, not the header, were first in the
combined palloc block.  list_concat is then no longer a problem, as long
as it doesn't try to throw away the second list's header.  But I haven't
thought long enough to be sure how well that would work.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] RFC: list API / memory allocations

2011-11-18 Thread Andres Freund

Hi List,

In many scenarios memory allocation is one of the top 3 functions showing up 
in profiles. Looking at hierarchical profiles (-fno-omit-frame-pointer) at 
least 
during parsing, planning and executor startup most of that is spent around the 
list API.

Many - especially in the parse-analyze phase  - of those allocations can be 
avoided because the lists are immutable and their size is known upfront.

Some examples:
parser: list_make$n, buildRelationAliases
planner: expression_tree_mutator, get_relation_info;
executor: ExecInitExpr

For that I added new functions/defines which allocate all the needed memory in 
one hunk:
list_immutable_make$n(),
List *list_new_immutable_n(NodeTag t, size_t n);
List *list_make_n(NodeTag t, size_t n, ...);

With those I could avoid about 1/3 of the allocations in some example 
scenarios (pgbench, custom benchmarks) by replacing a few notable callsites to 
use statically allocated lists.

The obvious problem with that approach is that approach is that those List's 
and ListCell's cannot be individually deallocated. Which especially in the 
planner isn't done anyway.
To check that I added an allocate_only to both when USE_ASSERT_CHECKING.

Using that approach I did measure improvements between 0.5-20% depending on 
the statement (using -M simple). Complex statements which are quick to execute 
and not too complicated to plan are the ones benefiting most. Which is not 
surprising.

The questions I have here:
* is that approach acceptable? I converted a the most notable callsites and 
the benefit is quite measurable. On the other hand one has to be rather careful 
when analyzing whether lists will be deallocated later on. Also its touching 
rather much code

* I plan to add list_copy_immutable as another function as that is the next 
biggest scenario where memory is allocated in forseeable scenarios.

* any suggestions what name to use instead of immutable? That doesn't really 
cut the real meaning of "allocate only". I didn't find a name though.


Then there is the 2/3 of calls which I couldn't beat with that approach. 
Mostly because the size of the lists is too hard to figure out.
My current approach to that would be to preallocate listcells by some caller 
defined amount.
A *very* quick hack which always overallocates Lists to contain 10 elements 
yields promising results (around another 2-20% of parse only time).

The problem with that is that is that the most sensible way seems to be to 
define the amount of preallocation per list. Which is currently not easily 
representable because empty lists are just 0/NILL.

So I have 3 solutions. All of which are not that great:
* allow empty Lists. This possibly requires a bit of code churn but seems 
somewhat sensible.
* Save the amount of preallocation needed in NIL by defining that no pointer 
can be less than say 0x100. That also requires some churn and gets points for 
uglyness
* always overallocate new lists. This is rather wasteful.

Any ideas?

if anybody wants the preliminary patches I am happy to send them but I would 
much rather prefer to make them somewhat after input.


Andres



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Core Extensions relocation

2011-11-18 Thread Josh Berkus

On 11/18/11 12:27 PM, Dimitri Fontaine wrote:
> Tom Lane  writes:
>> Why do you figure that, exactly?  The path of least resistance will
>> be precisely to leave everything packaged as it is, in a single
>> postgresql-contrib module.  I'm pretty likely to do that myself for
>> Fedora and RHEL.  Subdividing/rearranging contrib makes the packager's
>> life more complicated, *and* makes his users' lives more complicated,
>> if only because things aren't where they were before.  It seems unlikely
>> to happen, at least in the near term.
> 
> Then if we want packagers to move, what about removing all the
> extensions not listed by Greg from the contrib/ directory and inventing
> another place where to manage them, which is not automatically built,
> but still part of buildfarm tests, if at all possible.

Actually, the whole idea is that the "Core Management Extensions" should
move from the -contrib module to the -server module.  That is, those
extensions should always get installed with any server.

Of course, packagers may then reasonably ask why that code is not just
part of the core?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Core Extensions relocation

2011-11-18 Thread Dimitri Fontaine

Tom Lane  writes:
> Why do you figure that, exactly?  The path of least resistance will
> be precisely to leave everything packaged as it is, in a single
> postgresql-contrib module.  I'm pretty likely to do that myself for
> Fedora and RHEL.  Subdividing/rearranging contrib makes the packager's
> life more complicated, *and* makes his users' lives more complicated,
> if only because things aren't where they were before.  It seems unlikely
> to happen, at least in the near term.

Then if we want packagers to move, what about removing all the
extensions not listed by Greg from the contrib/ directory and inventing
another place where to manage them, which is not automatically built,
but still part of buildfarm tests, if at all possible.

Then the only change we suggest to packagers is to have the main
PostgreSQL package install the contrib one by means of dependencies.

I don't much like this solution, but that's how I read your email.  The
status quo is not a good place to live in.

> The upstream project can't force these decisions on packagers, and it
> doesn't help to go about under the illusion that we can.

Really? You are packaging for RHEL, Dave is responsible for the windows
packaging, Devrim is covering the other RPM systems (apart from SuSE
maybe and I'm not even sure) and Martin is caring for debian and ubuntu
and following along. We're missing BSD ports packagers, and we're
covering like 90% or more of the servers and developers installs.

We can't force everybody hands to doing it our way, but I'm pretty sure
we can talk to them and see what they think about the usefulness of this
proposal and how they intend to react.  We're not *that* disconnected.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Kevin Grittner

Andres Freund  wrote:
 
> When doing line-level profiles I would suggest looking at the
> instructions.
 
What's the best way to do that?
 
> I don't think cache line contention is the most likely candidate
> here.  Simple cache-misses seem far more likely. In combination
> with pipeline stalls...
> 
> Newer cpus (nehalem+) can measure stalled cycles which can be
> really useful when analyzing performance. I don't remember how to
> do that with oprofile right now though as I use perf these days
> (its -e stalled-cycles{frontend|backend} there}).
 
When I run oprofile, I still always go back to this post by Tom:
 
http://archives.postgresql.org/pgsql-performance/2009-06/msg00154.php
 
Can anyone provide such a "cheat sheet" for perf?  I could give that
a try if I knew how.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Andres Freund

On Friday, November 18, 2011 08:36:59 PM Kevin Grittner wrote:
> "Kevin Grittner"  wrote:
> > samples  %image name  symbol name
> > 4954633.6718  postgreshash_search_with_hash_value
> 
> When lines like these show up in the annotated version, I'm
> impressed that we're still finding gains as big as we are:
> 
>  44613  0.3306 :if (segp == NULL)
> 
>:hash_corrupted(hashp);
> 
> 101910  0.7552 :keysize = hashp->keysize;   /* ditto */
When doing line-level profiles I would suggest looking at the instructions. 
Quite often the line shown doesn't have much to do whats executed as the 
compiler tries to schedule instructions cleverly.
Also in many situations the shown cost doesn't actually lie in the instruction 
shown but in some previous one. The shown instruction e.g. has to wait for the 
result of the earlier instructions. Pipelining makes that hard to correctly 
observe.

A simplified example would be something like:

bool func(int a, int b, int c){
   int res = a / b;
   if(res == c){
   return true;
   }
   return false;
}

Likely the instruction showing up in the profile would be the comparison. Which 
obviously is not the really expensive part...


> There goes over 1% of my server run time, right there!
> 
> Of course, these make no sense unless there is cache line
> contention, which is why that area is bearing fruit.
I don't think cache line contention is the most likely candidate here.  Simple 
cache-misses seem far more likely. In combination with pipeline stalls...

Newer cpus (nehalem+) can measure stalled cycles which can be really useful 
when analyzing performance. I don't remember how to do that with oprofile right 
now though as I use perf these days (its -e stalled-cycles{frontend|backend} 
there}).

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Kevin Grittner

Robert Haas  wrote:
>>  I think so.  My take was that it was showing 32 of 64 *threads*
>> active -- the hyperthreading funkiness.  Is there something in
>> particular you'd like me to check?
> 
> Not really, just don't understand the number.
 
I'm having trouble resolving the vmstat numbers I got during the
32-client pgbench runs which modified data.
 
-M simple:
 
procs --memory- ---swap-- -io-
 r  b   swpd   free  buffcache   si   sobi bo
 system -cpu--
 in  cs us sy id wa st
30  1   4464 513492 205564572 5447212400 0  78170
 621724 1246300 30  8 61  1  0
27  1   4464 509288 205564572 5447460000 0 125620
 599403 1192046 29  8 63  1  0
35  1   4464 508368 205564572 5447699600 0  89801
 595939 1186496 29  8 63  0  0
25  0   4464 506088 205564572 5447866800 0  90121
 594800 1189649 28  8 63  0  0
 
-M prepared:
 
procs --memory-- ---swap-- -io-
 r  b   swpdfree  buffcache   si   sobi bo
 system -cpu--
 in  cs us sy id wa st
28  0   5612 1204404 205107344 5423053600 0  93212
 527284 1456417 22  9 69  0  0
 8  1   5612 1202044 205107344 542600 0  93217
 512819 1417457 21  9 70  1  0
17  1   5612 1201892 205107344 5423604800 0 132699
 502333 1412878 21  9 70  0  0
19  1   5612 1199208 205107344 5423893600 0  93612
 519113 1484386 21  9 69  0  0
 
So 60% or 70% idle without any I/O wait time.  I don't know how to
explain that.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] range_adjacent and discrete ranges

2011-11-18 Thread Tom Lane

Jeff Davis  writes:
> On Fri, 2011-11-18 at 10:33 -0500, Tom Lane wrote:
>> Would it be better for them to silently transform such cases to "empty"?

> I wouldn't like to extend that to int4range(4,3), however. When the
> upper bound is less than the lower bound, it's almost certainly a
> mistake, and the user should be informed.

Yeah, probably not.  However, I don't like the idea of
'(3,4)'::int4range throwing an error, as it currently does, because it
seems to require the application to have quite a lot of knowledge of the
range semantics to avoid having errors sprung on it.

> By the way, what does this have to do with canonical functions? This
> seems more like a constructor issue, and there is already a
> zero-argument constructor to make empty ranges.

What I was concerned about was whether Florian's idea of implementing
range_adjacent by testing for empty intervening range would work, or
would fail because of errors getting thrown.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Kevin Grittner

"Kevin Grittner"  wrote:

> samples  %image name  symbol name
> 4954633.6718  postgreshash_search_with_hash_value

When lines like these show up in the annotated version, I'm
impressed that we're still finding gains as big as we are:

 44613  0.3306 :if (segp == NULL)
   :hash_corrupted(hashp);

101910  0.7552 :keysize = hashp->keysize;   /* ditto */

There goes over 1% of my server run time, right there!

Of course, these make no sense unless there is cache line
contention, which is why that area is bearing fruit.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Kevin Grittner

"anara...@anarazel.de"  wrote:
> Kevin Grittner  schrieb:
 
>>samples  %image name  symbol name
>>9333944.9651  postgresAllocSetAlloc
>>8484764.5134  postgresbase_yyparse
>>7195153.8274  postgresSearchCatCache
 
> That profile looks like you ran pgbench with -m simple. How does
> it look with prepared instead?
 
samples  %image name  symbol name
4954633.6718  postgreshash_search_with_hash_value
4909713.6385  postgresGetSnapshotData
4439653.2902  postgresLWLockAcquire
4435663.2872  postgresAllocSetAlloc
3023882.2409  postgresXLogInsert
2868892.1261  postgresSearchCatCache
2464171.8262  postgresPostgresMain
2350181.7417  postgresheap_page_prune
1984421.4706  postgres_bt_compare
1814461.3447  postgreshash_any
1771311.3127  postgresExecInitExpr
1757751.3026  postgresLWLockRelease
1523241.1288  postgresPinBuffer
1502851.1137  postgresexec_bind_message
1452141.0762  postgresfmgr_info_cxt_security
1404931.0412  postgress_lock
1241620.9201  postgresLockAcquireExtended
1204290.8925  postgresMemoryContextAlloc
1170760.8676  postgrespfree
1164930.8633  postgresAllocSetFree
1050270.7783  postgrespgstat_report_activity
1014070.7515  postgresProcArrayLockAcquire
1007970.7470  postgresMemoryContextAllocZeroAligned
98360 0.7289  postgresProcArrayLockRelease
86938 0.6443  postgresheap_hot_search_buffer
82635 0.6124  postgreshash_search
79902 0.5921  postgreserrstart
79465 0.5889  postgresHeapTupleSatisfiesVacuum
78709 0.5833  postgresResourceOwnerReleaseInternal
76068 0.5637  postgresExecModifyTable
73043 0.5413  postgresheap_update
72175 0.5349  postgresstrlcpy
71253 0.5280  postgresMemoryContextAllocZero
 
tps = 27392.219364 (including connections establishing)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread anara...@anarazel.de



Kevin Grittner  schrieb:

>Robert Haas  wrote:
> 
>> Any chance you can run oprofile (on either branch, don't really
>> care) against the 32 client test and post the results?
> 
>Besides the other changes we discussed, I boosted scale to 150 and
>ran at READ COMMITTED isolation level (because all threads promptly
>crashed and burned at REPEATABLE READ -- we desperately need a
>pgbench option to retry a transaction on serialization failure). 
>The oprofile hot spots at half a percent or higher:
> 
>CPU: Intel Core/i7, speed 2262 MHz (estimated)
>Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with
>a unit mask of 0x00 (No unit mask) count 10
>samples  %image name  symbol name
>9333944.9651  postgresAllocSetAlloc
>8484764.5134  postgresbase_yyparse
>7195153.8274  postgresSearchCatCache
>4612752.4537  postgreshash_search_with_hash_value
>4264112.2682  postgresGetSnapshotData
>3229381.7178  postgresLWLockAcquire
>3222361.7141  postgrescore_yylex
>3054711.6249  postgresMemoryContextAllocZeroAligned
>2815431.4976  postgresexpression_tree_walker
>2702411.4375  postgresXLogInsert
>2348991.2495  postgresMemoryContextAlloc
>2101371.1178  postgresScanKeywordLookup
>1848570.9833  postgresheap_page_prune
>1736080.9235  postgreshash_any
>1530110.8139  postgres_bt_compare
>1445380.7689  postgresnocachegetattr
>1314660.6993  postgresfmgr_info_cxt_security
>1310010.6968  postgresgrouping_planner
>1308080.6958  postgresLWLockRelease
>1241120.6602  postgresPinBuffer
>1207450.6423  postgresLockAcquireExtended
>1129920.6010  postgresExecInitExpr
>1128300.6002  postgreslappend
>1123110.5974  postgresnew_list
>1103680.5871  postgrescheck_stack_depth
>1060360.5640  postgresAllocSetFree
>1025650.5456  postgresMemoryContextAllocZero
>94689 0.5037  postgresSearchSysCache
That profile looks like you ran pgbench with -m simple. How does it look with 
prepared instead?

Andres
>-- 
>Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Kevin Grittner

Robert Haas  wrote:
 
> Any chance you can run oprofile (on either branch, don't really
> care) against the 32 client test and post the results?
 
Besides the other changes we discussed, I boosted scale to 150 and
ran at READ COMMITTED isolation level (because all threads promptly
crashed and burned at REPEATABLE READ -- we desperately need a
pgbench option to retry a transaction on serialization failure). 
The oprofile hot spots at half a percent or higher:
 
CPU: Intel Core/i7, speed 2262 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with
a unit mask of 0x00 (No unit mask) count 10
samples  %image name  symbol name
9333944.9651  postgresAllocSetAlloc
8484764.5134  postgresbase_yyparse
7195153.8274  postgresSearchCatCache
4612752.4537  postgreshash_search_with_hash_value
4264112.2682  postgresGetSnapshotData
3229381.7178  postgresLWLockAcquire
3222361.7141  postgrescore_yylex
3054711.6249  postgresMemoryContextAllocZeroAligned
2815431.4976  postgresexpression_tree_walker
2702411.4375  postgresXLogInsert
2348991.2495  postgresMemoryContextAlloc
2101371.1178  postgresScanKeywordLookup
1848570.9833  postgresheap_page_prune
1736080.9235  postgreshash_any
1530110.8139  postgres_bt_compare
1445380.7689  postgresnocachegetattr
1314660.6993  postgresfmgr_info_cxt_security
1310010.6968  postgresgrouping_planner
1308080.6958  postgresLWLockRelease
1241120.6602  postgresPinBuffer
1207450.6423  postgresLockAcquireExtended
1129920.6010  postgresExecInitExpr
1128300.6002  postgreslappend
1123110.5974  postgresnew_list
1103680.5871  postgrescheck_stack_depth
1060360.5640  postgresAllocSetFree
1025650.5456  postgresMemoryContextAllocZero
94689 0.5037  postgresSearchSysCache
 
Do you want line numbers or lower percentages?
 
Two runs:
 
tps = 21946.961196 (including connections establishing)
tps = 22911.873227 (including connections establishing)
 
For write transactions, that seems pretty respectable.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COUNT(*) and index-only scans

2011-11-18 Thread Thom Brown

On 12 October 2011 17:26, Robert Haas  wrote:
> On Wed, Oct 12, 2011 at 11:59 AM, Tom Lane  wrote:
>> The place where the decision is actually somewhat hard, IMO, is where
>> you're pulling a small part of the table but significantly more than one
>> row, and the traditional best choice would be a bitmap scan.  Now we
>> have to guess whether possibly avoiding heap fetches is better than
>> batching the fetches, and it doesn't seem open-and-shut to me.
>
> Yes, I agree.
>
> I was actually wondering if there is some way we could make index-only
> scans work for bitmap index scans.  Something like this: If the index
> is covering, then as we retrieve each tuple, we check whether the page
> is all-visible.  If so, we return the data from the index tuple.  If
> not, we save the TID for later.  Then, when we get done scanning the
> index, we go back and fetch all the pages containing saved TIDs in
> ascending block number order.  The trouble is that I think you could
> get in some trouble if you use a TID bitmap here, because if you
> lossify the bitmap then you have to make sure you can't return a tuple
> that you already handled with the index-only approach (imagine that
> the visibility map bit gets cleared partway through the scan).  All in
> all this seems pretty complicated...

So is there a chance of getting bitmap index-only scans?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FlexLocks

2011-11-18 Thread Pavan Deolasee

On Fri, Nov 18, 2011 at 10:29 PM, Robert Haas  wrote:

>
> So the upside and downside of this approach is that it modifies the
> existing LWLock implementation rather than allowing multiple lock
> implementations to exist side-by-side.  That means every LWLock in the
> system has access to this functionality, which might be convenient if
> there turn out to be many uses for this technique.

Right.

> The bad news is
> that everyone pays the cost of checking the work queue in
> LWLockRelease().

I hope that would be minimal (may be just one instruction) for those
who don't want to use the facility.

>  It also means that you can't, for example, create a
> custom lock with different lock modes (e.g. S, SX, X, as I proposed
> upthread).
>

Thats a valid point.

> I am pretty dubious that there are going to be very many cases where
> we can get away with holding the spinlock while doing the work.  For
> example, WAL flush is a clear example of where we can optimize away
> spinlock acquisitions - if we communicate to people we wake up that
> their LSN is already flushed, they needn't reacquire the lock to
> check.  But we certainly can't hold a spinlock across a WAL flush.
>

I think thats mostly solvable as said upthread. We can and should
improve this mechanism so that the work is carried out with the
necessary LWLock instead of the spinlock. That would let other
processes to queue up for the lock while the tasks are being executed.
Or if the tasks only need shared lock, then other normal shared
requesters can go ahead and acquire the lock.

When I get some time, I would like to see if this can be extended to
have shared snapshots so that multiple callers of GetSnapshotData()
get the same snapshot, computed only once by scanning the proc array,
instead of having each process compute its own snapshot which remains
the same unless some transaction ends in between.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Kevin Grittner

Robert Haas  wrote:
 
> Yeah, I'd just drop -S.
 
Easily done.
 
> Make sure to use -c N -j N with pgbench, or you'll probably not be
> able to saturate it.
 
Yeah, that's part of the script I copied from you.
 
> I've also had good luck with wal_writer_delay=20ms, although if
> you have synchronous_commit=on that might not matter, and it's
> much less important since Simon's recent patch in that area went
> in.
 
What the heck; will do.
 
> What scale factor are you testing at?
 
100.  Perhaps I should boost that since I'm going as far as 128
clients?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: better support for debugging of overloaded functions

2011-11-18 Thread Pavel Stehule

2011/11/18 Robert Haas :
> On Fri, Nov 18, 2011 at 6:24 AM, Pavel Stehule  
> wrote:
>> CONTEXT:  PL/pgSQL function "assign_rslts" line 50 at assignment (oid: 65903)
>>
>> \sf+ 65903
>
> I'm pretty unenthused by the idea of making OIDs more user-visible
> than they already are.  If the message is ambiguous, we should include
> argument types and (if not the object that would be visible under the
> current search_path) a schema qualification.  Spitting out a five (or
> six or seven or eight) digit number doesn't seem like a usability
> improvement.
>

yes - it's not nice - but it is simple and robust and doesn't depend
on actual search_path setting.

Nicer solution is a function signature - it can be assembled when
function is compiled. I see only one disadvantage - signature can be
too wide and can depend on search_path (and search_path can be
different when function is executed and when someone run sql console).
Signature should be prepared before execution, because there are no
access to system tables after exception.

I like any solution, because debugging of overloaded function is terrible now.

Regards

Pavel

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Robert Haas

On Fri, Nov 18, 2011 at 12:45 PM, Kevin Grittner
 wrote:
> OK.  Sorry for misunderstanding that.  I haven't gotten around to a
> deep reading of the patch yet.  :-(  I based this on the test script
> you posted here (with slight modifications for my preferred
> directory structures):
>
> http://archives.postgresql.org/pgsql-hackers/2011-10/msg00605.php
>
> If I just drop the -S switch will I have a good test, or are there
> other adjustments I should make (besides increasing checkpoint
> segments)?  (Well, for the SELECT-only test I didn't bother putting
> pg_xlog on a separate RAID 10 on it's own BBU controller as we
> normally would for this machine, I'll cover that, too.)

Yeah, I'd just drop -S.  Make sure to use -c N -j N with pgbench, or
you'll probably not be able to saturate it.  I've also had good luck
with wal_writer_delay=20ms, although if you have synchronous_commit=on
that might not matter, and it's much less important since Simon's
recent patch in that area went in.

What scale factor are you testing at?

>> It doesn't make any sense for PostgreSQL master to be using only
>> 50% of the CPU and leaving the rest idle on a lots-of-clients
>> SELECT-only test.  That could easily happen on 9.1, but my lock
>> manager changes eliminated the only place where anything gets put
>> to sleep in that path (except for the emergency sleeps done by
>> s_lock, when a spinlock is really badly contended).  So I'm
>> confused by these results. Are we sure that the processes are
>> being scheduled across all 32 physical cores?
>
> I think so.  My take was that it was showing 32 of 64 *threads*
> active -- the hyperthreading funkiness.  Is there something in
> particular you'd like me to check?

Not really, just don't understand the number.

>> At any rate, I do think it's likely that you're being bitten by
>> spinlock contention, but we'd need to do some legwork to verify
>> that and work out the details.  Any chance you can run oprofile
>> (on either branch, don't really care) against the 32 client test
>> and post the results?  If it turns out s_lock is at the top of the
>> heap, I can put together a patch to help figure out which spinlock
>> is the culprit.
>
> oprofile isn't installed on this machine.  I'll take care of that
> and post results when I can.

OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Kevin Grittner

Robert Haas  wrote:
> Kevin Grittner  wrote:

>>> Then again, is this a regular pgbench test or is this
>>> SELECT-only?
>>
>> SELECT-only
> 
> Ah, OK.  I would not expect flexlocks to help with that; Pavan's
> patch might, though.

OK.  Sorry for misunderstanding that.  I haven't gotten around to a
deep reading of the patch yet.  :-(  I based this on the test script
you posted here (with slight modifications for my preferred
directory structures):

http://archives.postgresql.org/pgsql-hackers/2011-10/msg00605.php

If I just drop the -S switch will I have a good test, or are there
other adjustments I should make (besides increasing checkpoint
segments)?  (Well, for the SELECT-only test I didn't bother putting
pg_xlog on a separate RAID 10 on it's own BBU controller as we
normally would for this machine, I'll cover that, too.)

> It doesn't make any sense for PostgreSQL master to be using only
> 50% of the CPU and leaving the rest idle on a lots-of-clients
> SELECT-only test.  That could easily happen on 9.1, but my lock
> manager changes eliminated the only place where anything gets put
> to sleep in that path (except for the emergency sleeps done by
> s_lock, when a spinlock is really badly contended).  So I'm
> confused by these results. Are we sure that the processes are
> being scheduled across all 32 physical cores?

I think so.  My take was that it was showing 32 of 64 *threads*
active -- the hyperthreading funkiness.  Is there something in
particular you'd like me to check?

> At any rate, I do think it's likely that you're being bitten by
> spinlock contention, but we'd need to do some legwork to verify
> that and work out the details.  Any chance you can run oprofile
> (on either branch, don't really care) against the 32 client test
> and post the results?  If it turns out s_lock is at the top of the
> heap, I can put together a patch to help figure out which spinlock
> is the culprit.

oprofile isn't installed on this machine.  I'll take care of that
and post results when I can.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: better support for debugging of overloaded functions

2011-11-18 Thread Robert Haas

On Fri, Nov 18, 2011 at 6:24 AM, Pavel Stehule  wrote:
> CONTEXT:  PL/pgSQL function "assign_rslts" line 50 at assignment (oid: 65903)
>
> \sf+ 65903

I'm pretty unenthused by the idea of making OIDs more user-visible
than they already are.  If the message is ambiguous, we should include
argument types and (if not the object that would be visible under the
current search_path) a schema qualification.  Spitting out a five (or
six or seven or eight) digit number doesn't seem like a usability
improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Robert Haas

On Fri, Nov 18, 2011 at 12:03 PM, Kevin Grittner
 wrote:
>> Then again, is this a regular pgbench test or is this SELECT-only?
>
> SELECT-only

Ah, OK.  I would not expect flexlocks to help with that; Pavan's patch
might, though.

>> Can you by any chance check top or vmstat during the 32-client
>> test and see what percentage you have of user time/system
>> time/idle time?
>
> You didn't say whether you wanted master or flexlock, but it turned
> out that any difference was way too far into the noise to show.
> They both looked like this:
>
> procs --memory- ---swap-- -io
>  r  b   swpd    free      buff    cache   si   so    bi    bo
>  system -cpu--
>     in      cs us sy id wa st
> 38  0    352 1157400 207177020 52360472    0    0     0    16
>  13345 1190230 40  7 53  0  0
> 37  0    352 1157480 207177020 52360472    0    0     0     0
>  12953 1263310 40  8 52  0  0
> 36  0    352 1157484 207177020 52360472    0    0     0     0
>  13411 1233365 38  7 54  0  0
> 37  0    352 1157476 207177020 52360472    0    0     0     0
>  12780 1193575 41  7 51  0  0
>
> Keep in mind that while there are really 32 cores, the cpu
> percentages seem to be based on the "threads" from hyperthreading.
> Top showed pgbench (running on the same machine) as eating a pretty
> steady 5.2 of the cores, leaving 26.8 cores to actually drive the 32
> postgres processes.

It doesn't make any sense for PostgreSQL master to be using only 50%
of the CPU and leaving the rest idle on a lots-of-clients SELECT-only
test.  That could easily happen on 9.1, but my lock manager changes
eliminated the only place where anything gets put to sleep in that
path (except for the emergency sleeps done by s_lock, when a spinlock
is really badly contended).  So I'm confused by these results.  Are we
sure that the processes are being scheduled across all 32 physical
cores?

At any rate, I do think it's likely that you're being bitten by
spinlock contention, but we'd need to do some legwork to verify that
and work out the details.  Any chance you can run oprofile (on either
branch, don't really care) against the 32 client test and post the
results?  If it turns out s_lock is at the top of the heap, I can put
together a patch to help figure out which spinlock is the culprit.

Anyway, this is probably a digression as it relates to FlexLocks:
those are not optimizing for a read-only workload.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Kevin Grittner

"Kevin Grittner"  wrote:

> We have a 32-core Intel box (4 x X7560 @ 2.27GHz) with 256 GB
> RAM.

In case anyone cares, this is the same box for which I posted STREAM
test results a while back.  The PostgreSQL tests seem to peak on
this 32-core box at 64 clients, while the STREAM test of raw RAM
speed kept increasing up to 128 clients.  Overall, though, it's
impressive how close PostgreSQL is now coming to the raw RAM access
speed curve.

http://archives.postgresql.org/pgsql-hackers/2011-08/msg01306.php

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Kevin Grittner

Robert Haas  wrote:

> Then again, is this a regular pgbench test or is this SELECT-only?

SELECT-only

> Can you by any chance check top or vmstat during the 32-client
> test and see what percentage you have of user time/system
> time/idle time?

You didn't say whether you wanted master or flexlock, but it turned
out that any difference was way too far into the noise to show. 
They both looked like this:

procs --memory- ---swap-- -io
 r  b   swpdfree  buffcache   si   sobibo
 system -cpu--
 in  cs us sy id wa st
38  0352 1157400 207177020 5236047200 016
  13345 1190230 40  7 53  0  0
37  0352 1157480 207177020 5236047200 0 0
  12953 1263310 40  8 52  0  0
36  0352 1157484 207177020 5236047200 0 0
  13411 1233365 38  7 54  0  0
37  0352 1157476 207177020 5236047200 0 0
  12780 1193575 41  7 51  0  0

Keep in mind that while there are really 32 cores, the cpu
percentages seem to be based on the "threads" from hyperthreading. 
Top showed pgbench (running on the same machine) as eating a pretty
steady 5.2 of the cores, leaving 26.8 cores to actually drive the 32
postgres processes.

> What OS are you running?

Linux new-CIR 2.6.32.43-0.4-default #1 SMP 2011-07-14 14:47:44 +0200
x86_64 x86_64 x86_64 GNU/Linux

SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 1

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FlexLocks

2011-11-18 Thread Robert Haas

On Fri, Nov 18, 2011 at 6:26 AM, Pavan Deolasee
 wrote:
> My apologies for hijacking the thread, but the work seems quite
> related, so I thought I should post here instead of starting a new
> thread.
>
> Here is a WIP patch based on the idea of having a shared Q. A process
> trying to access the shared memory protected by a LWLock, sets up the
> task in its PGPROC and calls a new API LWLockExecute(). If the LWLock
> is available, the task is performed immediately and the function
> returns. Otherwise, the process queues up itself on the lock. When the
> last shared lock holder or the exclusive lock holder call
> LWLockRelease(), it scans through such pending tasks, executes them
> via a callback mechanism and wakes all those processes along with any
> other normal waiter(s) waiting on LWLockAcquire().
>
> I have only coded for ProcArrayEndTransaction, but it should fairly
> easy to extend the usage at some more places, especially those which
> does some simple modifications to the protected area. I don't propose
> to use the technique for every user of LWLock, but there can be some
> obvious candidates, including this one that Robert found out.
>
> I see 35-40% improvement for 32-80 clients on a 5 minutes pgbench -N
> run with scale factor of 100 and permanent tables. This is on a
> 32-core HP IA box.
>
> There are few things that need some deliberations. The pending tasks
> are right now executed while holding the mutex (spinlock). This is
> good and bad for obvious reasons. We can possibly change that so that
> the work is done without holding the spinlock or leave to the caller
> to choose the behavior. Doing it without holding the spinlock will
> make the technique interesting for many more callers. We can also
> rework the task execution so that pending similar requests from
> multiple callers can be combined and executed with a single callback,
> if the caller knows its safe to do so. I haven't thought through the
> API/callback changes to support that, but its definitely possible and
> could be quite useful in many cases. For example, status of many
> transactions can be checked with a single lookup of the ProcArray. Or
> WAL inserts from multiple processes can be combined and written at
> once.

So the upside and downside of this approach is that it modifies the
existing LWLock implementation rather than allowing multiple lock
implementations to exist side-by-side.  That means every LWLock in the
system has access to this functionality, which might be convenient if
there turn out to be many uses for this technique.  The bad news is
that everyone pays the cost of checking the work queue in
LWLockRelease().  It also means that you can't, for example, create a
custom lock with different lock modes (e.g. S, SX, X, as I proposed
upthread).

I am pretty dubious that there are going to be very many cases where
we can get away with holding the spinlock while doing the work.  For
example, WAL flush is a clear example of where we can optimize away
spinlock acquisitions - if we communicate to people we wake up that
their LSN is already flushed, they needn't reacquire the lock to
check.  But we certainly can't hold a spinlock across a WAL flush.
The nice thing about the FlexLock approach is that it permits
fine-grained control over these types of policies: one lock type can
switch to exclusive mode, do the work, and then reacquire the spinlock
to hand off the baton; another can do the work while holding the
spinlock; and still a third can forget about work queues altogether
but introduce additional lock modes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] range_adjacent and discrete ranges

2011-11-18 Thread Jeff Davis

On Fri, 2011-11-18 at 13:32 +0100, Florian Pflug wrote:
> That information, however, *is* already contained in the canonical
> functions, because those function know that (2,3) are empty as an integer
> range, but non-empty as a float range.

Very good point. Thank you.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] range_adjacent and discrete ranges

2011-11-18 Thread Jeff Davis

On Fri, 2011-11-18 at 10:33 -0500, Tom Lane wrote:
> regression=# select int4range(4,4,'(]');
> ERROR:  range lower bound must be less than or equal to range upper bound
> regression=# select int4range(4,4,'()');
> ERROR:  range lower bound must be less than or equal to range upper bound
> 
> Would it be better for them to silently transform such cases to "empty"?

That had crossed my mind, but I read the first as saying that it
includes 4 and doesn't include 4, which is a little confusing.

But I wouldn't object to making them return empty ranges. Seeing that we
removed some other errors in favor of returning something, it might be a
little more consistent to return empty when possible.

I wouldn't like to extend that to int4range(4,3), however. When the
upper bound is less than the lower bound, it's almost certainly a
mistake, and the user should be informed.

By the way, what does this have to do with canonical functions? This
seems more like a constructor issue, and there is already a
zero-argument constructor to make empty ranges.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] OidFunctionCall* returning null.

2011-11-18 Thread David Zwarg

I just found this thread:

http://archives.postgresql.org/pgsql-general/2011-11/msg00424.php

So I'll use the same workaround.

Nothing to see here, folks, move along

d

On Fri, Nov 18, 2011 at 11:17 AM, David Zwarg wrote:

> Hello,
>
> I have been working with the PostGIS developers, and I'm implementing a
> facility to use 'callback' functions to process cells in a raster image.
>
> I have implemented this behind the scenes as a C function that calls a
> provided sql regprocedure with OidFunctionCall*. I have been reading the
> docs, and it states that "Note that neither arguments nor result are
> allowed to be NULL."
>
> In this use case, there are legitimate reasons to return NULL from a
> 'callback' function -- is there an alternative method that I should use,
> instead of OidFunctionCall*?
>
> Thanks,
> David
>

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Robert Haas

On Fri, Nov 18, 2011 at 11:26 AM, Kevin Grittner
 wrote:
> Robert Haas  wrote:
>> Nate Boley's AMD 6128 box (which has 32 cores) and an HP Integrity
>> server (also with 32 cores).
>
>> [clear improvement with flexlock patch]
>
> Hmm.  We have a 32-core Intel box (4 x X7560 @ 2.27GHz) with 256 GB
> RAM.  It's about a week from going into production, at which point
> it will be extremely hard to schedule such tests, but for a few days
> more I've got shots at it.  The flexlock patch doesn't appear to be
> such a clear win here.
>
> I started from Robert's tests, but used these settings so that I
> could go to higher client counts and better test serializable
> transactions.  Everything is fully cached.
>
> max_connections = 200
> max_pred_locks_per_transaction = 256
> shared_buffers = 8GB
> maintenance_work_mem = 1GB
> checkpoint_segments = 30
> checkpoint_timeout = 15min
> checkpoint_completion_target = 0.9
> seq_page_cost = 0.1
> random_page_cost = 0.1
> cpu_tuple_cost = 0.05
> effective_cache_size = 40GB
> default_transaction_isolation = '$iso'

I had a dismaying benchmarking experience recently that involved
settings very similar to the ones you've got there - in particular, I
also had checkpoint_segments set to 30.  When I raised it to 300,
performance improved dramatically at 8 clients and above.

Then again, is this a regular pgbench test or is this SELECT-only?
Because the absolute numbers you're posting are vastly higher than
anything I've ever seen on a write test.

Can you by any chance check top or vmstat during the 32-client test
and see what percentage you have of user time/system time/idle time?

What OS are you running?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing ProcArrayLock patches

2011-11-18 Thread Kevin Grittner

Robert Haas  wrote:
 
> Nate Boley's AMD 6128 box (which has 32 cores) and an HP Integrity
> server (also with 32 cores). 
 
> [clear improvement with flexlock patch]
 
Hmm.  We have a 32-core Intel box (4 x X7560 @ 2.27GHz) with 256 GB
RAM.  It's about a week from going into production, at which point
it will be extremely hard to schedule such tests, but for a few days
more I've got shots at it.  The flexlock patch doesn't appear to be
such a clear win here.
 
I started from Robert's tests, but used these settings so that I
could go to higher client counts and better test serializable
transactions.  Everything is fully cached.
 
max_connections = 200
max_pred_locks_per_transaction = 256
shared_buffers = 8GB
maintenance_work_mem = 1GB
checkpoint_segments = 30
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
seq_page_cost = 0.1
random_page_cost = 0.1
cpu_tuple_cost = 0.05
effective_cache_size = 40GB
default_transaction_isolation = '$iso'
 
Serializable results not shown here -- that's to gather information
for trying to improve SSI locking.
 
m1 tps = 7847.834544 (including connections establishing)
f1 tps = 7917.225382 (including connections establishing)
m2 tps = 18672.145526 (including connections establishing)
f2 tps = 17486.435322 (including connections establishing)
m4 tps = 34371.278253 (including connections establishing)
f4 tps = 34465.898173 (including connections establishing)
m8 tps = 68228.261694 (including connections establishing)
f8 tps = 68505.285830 (including connections establishing)
m16 tps = 127449.815100 (including connections establishing)
f16 tps = 127208.939670 (including connections establishing)
m32 tps = 201738.209348 (including connections establishing)
f32 tps = 201637.237903 (including connections establishing)
m64 tps = 380326.800557 (including connections establishing)
f64 tps = 380628.429408 (including connections establishing)
m80 tps = 366628.197546 (including connections establishing)
f80 tps = 162594.012051 (including connections establishing)
m96 tps = 360922.948775 (including connections establishing)
f96 tps = 366728.987041 (including connections establishing)
m128 tps = 352159.631878 (including connections establishing)
f128 tps = 355475.129448 (including connections establishing)
 
I did five runs each and took the median.  In most cases, the values
were pretty close to one another in a group, so confidence is pretty
high that this is meaningful.  There were a few anomalies where
performance for one or more samples was horrid.  This seems
consistent with the theory of pathological pileups on the LW locks
(or also flexlocks?).
 
The problem groups:
 
m64 tps = 380407.768906 (including connections establishing)
m64 tps = 79197.470389 (including connections establishing)
m64 tps = 381112.194105 (including connections establishing)
m64 tps = 378579.036542 (including connections establishing)
m64 tps = 380326.800557 (including connections establishing)

m96 tps = 360582.945291 (including connections establishing)
m96 tps = 363021.805138 (including connections establishing)
m96 tps = 362468.870516 (including connections establishing)
m96 tps = 59614.322351 (including connections establishing)
m96 tps = 360922.948775 (including connections establishing)

f80 tps = 158905.149822 (including connections establishing)
f80 tps = 157192.460599 (including connections establishing)
f80 tps = 370757.790443 (including connections establishing)
f80 tps = 162594.012051 (including connections establishing)
f80 tps = 372170.638516 (including connections establishing)

f96 tps = 366804.733788 (including connections establishing)
f96 tps = 366728.987041 (including connections establishing)
f96 tps = 365490.380848 (including connections establishing)
f96 tps = 366770.193305 (including connections establishing)
f96 tps = 125225.371140 (including connections establishing)
 
So the lows don't seem to be as low when they happen with the
flexlock patch, but they still happen -- possibly more often?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] OidFunctionCall* returning null.

2011-11-18 Thread David Zwarg

Hello,

I have been working with the PostGIS developers, and I'm implementing a
facility to use 'callback' functions to process cells in a raster image.

I have implemented this behind the scenes as a C function that calls a
provided sql regprocedure with OidFunctionCall*. I have been reading the
docs, and it states that "Note that neither arguments nor result are
allowed to be NULL."

In this use case, there are legitimate reasons to return NULL from a
'callback' function -- is there an alternative method that I should use,
instead of OidFunctionCall*?

Thanks,
David

Re: [HACKERS] WIP: Collecting statistics on CSV file data

2011-11-18 Thread Robert Haas

2011/11/18 Shigeru Hanada :
> - I couldn't see the reason why file_fdw sets ctid of sample tuples,
> though I guess it's for Vitter's random sampling algorithm.  If every
> FDW must set valid ctid to sample tuples, it should be mentioned in
> document of AnalyzeForeignTable.  Exporting some functions from
> analyze.c relates this issue?

If every FDW must set valid ctid to sample tuples, it should be fixed
so that they don't have to, I would think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] range_adjacent and discrete ranges

2011-11-18 Thread Tom Lane

Florian Pflug  writes:
> ...This definition does not depend on any specific canonical form of ranges,
> only on the canonicalize function's ability to detect empty ranges.

Hmm, well, now that you mention it, I don't think the current canonical
functions handle empty ranges very nicely at all.  They tend to spit up:

regression=# select int4range(4,4,'[]');
 int4range 
---
 [4,5)
(1 row)

regression=# select int4range(4,4,'(]');
ERROR:  range lower bound must be less than or equal to range upper bound
regression=# select int4range(4,4,'()');
ERROR:  range lower bound must be less than or equal to range upper bound

Would it be better for them to silently transform such cases to "empty"?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] VACUUM touching file but not updating relation

2011-11-18 Thread Simon Riggs

On Fri, Nov 18, 2011 at 3:18 PM, Tom Lane  wrote:

> What Thom's complaining about is that the buffer may be marked dirty
> unnecessarily, ie when there has been no actual data change.

OK, I'll patch it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] VACUUM touching file but not updating relation

2011-11-18 Thread Tom Lane

Simon Riggs  writes:
> On Fri, Nov 18, 2011 at 2:47 PM, Tom Lane  wrote:
>> Well, it's expected given the current coding in the btree vacuum logic.
>> It's not clear to me why it was written like that, though.

> The code works as designed.

> _bt_delitems_vacuum() is only ever called with nitems == 0 when it is
> the last block of the relation with wal_level = hot standby

> As discussed in the comments we must issue a WAL record for the last
> block, whatever else has occurred.

> So the correct number of WAL records is emitted and I see no bug there.

What Thom's complaining about is that the buffer may be marked dirty
unnecessarily, ie when there has been no actual data change.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] VACUUM touching file but not updating relation

2011-11-18 Thread Simon Riggs

On Fri, Nov 18, 2011 at 2:47 PM, Tom Lane  wrote:
> Thom Brown  writes:
>>> On 11 November 2011 23:28, Tom Lane  wrote:
 I observe that _bt_delitems_vacuum() unconditionally dirties the page
 and writes a WAL record, whether it has anything to do or not; and that
 if XLogStandbyInfoActive() then btvacuumscan will indeed call it despite
 there being (probably) nothing useful to do.  Seems like that could be
 improved.  The comment explaining why it's necessary to do that doesn't
 make any sense to me, either.
>
>>> Well the effect, in the single instances I've checked, is certainly
>>> more pronounced for hot_standby, but there still appears to be some
>>> occurrences for minimal wal_level too.
>
>> So would you say this is acceptable and normal activity, or is
>> something awry here?
>
> Well, it's expected given the current coding in the btree vacuum logic.
> It's not clear to me why it was written like that, though.

The code works as designed.

_bt_delitems_vacuum() is only ever called with nitems == 0 when it is
the last block of the relation with wal_level = hot standby

As discussed in the comments we must issue a WAL record for the last
block, whatever else has occurred.

So the correct number of WAL records is emitted and I see no bug there.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] VACUUM touching file but not updating relation

2011-11-18 Thread Simon Riggs

On Fri, Nov 18, 2011 at 2:47 PM, Tom Lane  wrote:
> Thom Brown  writes:
>>> On 11 November 2011 23:28, Tom Lane  wrote:
 I observe that _bt_delitems_vacuum() unconditionally dirties the page
 and writes a WAL record, whether it has anything to do or not; and that
 if XLogStandbyInfoActive() then btvacuumscan will indeed call it despite
 there being (probably) nothing useful to do.  Seems like that could be
 improved.  The comment explaining why it's necessary to do that doesn't
 make any sense to me, either.
>
>>> Well the effect, in the single instances I've checked, is certainly
>>> more pronounced for hot_standby, but there still appears to be some
>>> occurrences for minimal wal_level too.
>
>> So would you say this is acceptable and normal activity, or is
>> something awry here?
>
> Well, it's expected given the current coding in the btree vacuum logic.
> It's not clear to me why it was written like that, though.

I'll take a look.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Do missed autoheader run for previous commit.

2011-11-18 Thread Tom Lane

Alvaro Herrera  writes:
> Excerpts from Tom Lane's message of vie nov 18 11:12:32 -0300 2011:
>>> Hmm, does the win32 file need updating too?

>> I don't see HAVE_SCANDIR in there, do you?

> Well, I wonder if the win32 file needs to be hooked to the whole
> autoconf/autoheader thing somehow.  I mean, if HAVE_SCANDIR wasn't
> there, does this mean that when it was added to pg_config.h we forgot to
> update the win32 copy?

Well, it might mean that, or it might mean that somebody intentionally
didn't bother to add the symbol because Windows hasn't got the function.

Yeah, it would be nice if autoconf worked for Windows ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] VACUUM touching file but not updating relation

2011-11-18 Thread Tom Lane

Thom Brown  writes:
>> On 11 November 2011 23:28, Tom Lane  wrote:
>>> I observe that _bt_delitems_vacuum() unconditionally dirties the page
>>> and writes a WAL record, whether it has anything to do or not; and that
>>> if XLogStandbyInfoActive() then btvacuumscan will indeed call it despite
>>> there being (probably) nothing useful to do.  Seems like that could be
>>> improved.  The comment explaining why it's necessary to do that doesn't
>>> make any sense to me, either.

>> Well the effect, in the single instances I've checked, is certainly
>> more pronounced for hot_standby, but there still appears to be some
>> occurrences for minimal wal_level too.

> So would you say this is acceptable and normal activity, or is
> something awry here?

Well, it's expected given the current coding in the btree vacuum logic.
It's not clear to me why it was written like that, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] vpath builds and verbose error messages

2011-11-18 Thread Tom Lane

Alvaro Herrera  writes:
> Excerpts from Peter Eisentraut's message of vie nov 18 01:34:18 -0300 2011:
>> When using verbose error messages (psql \set VERBOSITY verbose) with a
>> vpath build, you get this sort of thing:
>> LOCATION:  transformColumnRef, 
>> /build/buildd-postgresql-9.1_9.1.1-3-i386-AP0ovQ/postgresql-9.1-9.1.1/build/../src/backend/parser/parse_expr.c:766
>> 
>> Can we shorten that path somehow?  Either in the C code when printing it
>> out, or during the build.  Any ideas?

> Huh, I hadn't noticed, even though I see those all the time.  I agree
> with shortening the path so that it's relative to the root source dir,
> if that's what you propose:

In a non-vpath build you just get the file name (or at least that's what
I'm used to seeing).  I agree with Peter that a full path seems a bit
over the top.

It wouldn't be that hard for elog.c to do strrchr(fname, '/') or
something like that, but the idea that there are hundreds of full-path
strings embedded in the executable is a bit annoying.  I guess we could
hope that the compiler is bright enough to store it only once per source
file, so the actual space requirement may not be all *that* bad.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Core Extensions relocation

2011-11-18 Thread Tom Lane

Greg Smith  writes:
> On 11/17/2011 03:18 PM, Peter Eisentraut wrote:
>> Who's to say that after this, the core extensions won't end up in a new
>> separate package postgresql-extensions (or similar) or might even stay
>> in postgresql-contrib, for compatibility?

> I don't know why packagers would make an active decision that will make 
> their lives more difficult, with no benefit to them and a regression 
> against project recommendations for their users.

Why do you figure that, exactly?  The path of least resistance will
be precisely to leave everything packaged as it is, in a single
postgresql-contrib module.  I'm pretty likely to do that myself for
Fedora and RHEL.  Subdividing/rearranging contrib makes the packager's
life more complicated, *and* makes his users' lives more complicated,
if only because things aren't where they were before.  It seems unlikely
to happen, at least in the near term.

> And if some wanted to wander this way, they'll end up having to maintain 
> a doc patch to address the fact that they've broken with project 
> recommendations.  This text in what I submitted will no longer be true:

You're assuming anybody will notice or care about that text, if indeed
it gets committed/released with that wording at all.

The upstream project can't force these decisions on packagers, and it
doesn't help to go about under the illusion that we can.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RangeVarGetRelid()

2011-11-18 Thread Noah Misch

On Fri, Nov 18, 2011 at 08:58:30AM -0500, Robert Haas wrote:
> On Fri, Nov 18, 2011 at 8:37 AM, Noah Misch  wrote:
> > I count 1/25 callers overriding nowait and 3/25 overriding missing_ok. ?So, 
> > it's
> > looking like a less-common override than the callback function will come to 
> > be.
> 
> Yeah, you're probably right.  However, I think there's another good
> reason not to use that signature: in 9.1, the function had a signature
> of (RangeVar *, bool).  If in 9.2 it ends up with a signature of
> (RangeVar *, LOCKMODE), you won't get a compiler warning (at least not
> on my system) if you invoke it as RangeVarGetRelid(rel, true).  You'll
> just get silently (and subtly) different behavior: an error if the
> relataion doesn't exist (instead of no error), and an AccessShareLock
> if it does (instead of no lock).  I think we're best off making sure
> that any old-style usage of RangeVarGetRelid() that may be out there
> in third-party code fails to compile.

Good call.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inlining comparators as a performance optimisation

2011-11-18 Thread Tom Lane

Simon Riggs  writes:
> On Fri, Nov 18, 2011 at 5:20 AM, Robert Haas  wrote:
>> I think that we should really consider doing with this patch what Tom
>> suggested upthread; namely, looking for a mechanism to allow
>> individual datatypes to offer up a comparator function that doesn't
>> require bouncing through FunctionCall2Coll().

> I don't think its credible to implement that kind of generic
> improvement at this stage of the release cycle.

Er, *what*?  We're in mid development cycle, we are nowhere near
release.  When exactly would you have us make major changes?

In any case, what I understood Robert to be proposing was an add-on
feature that could be implemented in one datatype at a time.  Not
a global flag day.  We couldn't really do the latter anyway without
making life very unpleasant for authors of extension datatypes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RangeVarGetRelid()

2011-11-18 Thread Robert Haas

On Fri, Nov 18, 2011 at 8:37 AM, Noah Misch  wrote:
> I count 1/25 callers overriding nowait and 3/25 overriding missing_ok.  So, 
> it's
> looking like a less-common override than the callback function will come to 
> be.

Yeah, you're probably right.  However, I think there's another good
reason not to use that signature: in 9.1, the function had a signature
of (RangeVar *, bool).  If in 9.2 it ends up with a signature of
(RangeVar *, LOCKMODE), you won't get a compiler warning (at least not
on my system) if you invoke it as RangeVarGetRelid(rel, true).  You'll
just get silently (and subtly) different behavior: an error if the
relataion doesn't exist (instead of no error), and an AccessShareLock
if it does (instead of no lock).  I think we're best off making sure
that any old-style usage of RangeVarGetRelid() that may be out there
in third-party code fails to compile.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] VACUUM touching file but not updating relation

2011-11-18 Thread Thom Brown

On 12 November 2011 00:08, Thom Brown  wrote:
> On 11 November 2011 23:28, Tom Lane  wrote:
>> Thom Brown  writes:
>>> On 11 November 2011 00:55, Tom Lane  wrote:
 Thom Brown  writes:
> I just noticed that the VACUUM process touches a lot of relations
> (affects mtime) but for one file I looked at, it didn't change.  This
> doesn't always happen, and many relations aren't touched at all.
>>
 No immmediate ideas as to why the mtime would change if the file
 contents didn't.  It seems like there must be a code path that marked
 a buffer dirty without having changed it, but we're usually pretty
 careful about that.
>>
>>> I checked all files where the time stamp of the file had changed, but
>>> had the same MD5 sum.  I used the list in the query you mentioned and
>>> get: [ mostly indexes ]
>>
>> Hmm, is this on a hot standby master?
>
> It's using a wal_level of hot_standby and has max_wal_senders set to
> 2, but it's not actually replicating to anywhere else.  But if I
> comment out both of these, restart, then compare pre-vacuum and
> post-vacuum, I get the following results for unchanged but touched
> items:
>
> test=# select oid,relname from pg_class where relfilenode in
> (11680,11682,11684,11686,11690,16530);
>  oid  |       relname
> ---+-
>  2619 | pg_statistic
>  2840 | pg_toast_2619
>  2841 | pg_toast_2619_index
>  16530 | cows2
> (4 rows)
>
> The items which didn't match a result in this instance were 11686 and
> 11690, which is surprising since they both have a visibility map and
> free space map, indicating they're some kind of table.
>
>> I observe that _bt_delitems_vacuum() unconditionally dirties the page
>> and writes a WAL record, whether it has anything to do or not; and that
>> if XLogStandbyInfoActive() then btvacuumscan will indeed call it despite
>> there being (probably) nothing useful to do.  Seems like that could be
>> improved.  The comment explaining why it's necessary to do that doesn't
>> make any sense to me, either.
>
> Well the effect, in the single instances I've checked, is certainly
> more pronounced for hot_standby, but there still appears to be some
> occurrences for minimal wal_level too.

So would you say this is acceptable and normal activity, or is
something awry here?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inlining comparators as a performance optimisation

2011-11-18 Thread Robert Haas

On Fri, Nov 18, 2011 at 3:53 AM, Simon Riggs  wrote:
> We have no proof that we need to do this for 10 or 100 data types. We
> only currently have proof that there is gain for the most common
> types.

Well, that's kind of my point.  I think this needs more work before we
decide what the best approach is.  So far, the ONLY test result we
have that compares inlining to not inlining shows a speedup from 60 ms
to 52 ms.  I think that an 8 ms speedup on one test with one datatype
on one platform/compiler combination isn't sufficient evidence to
conclude that this is the best possible approach.

I think the way to look at this is that this patch contains two
possibly good ideas: one of which is to make the qsort() argument
match the qsort() calling convention, and the other of which is to
have multiple copies of the qsort() logic that inline the comparators
for their respective datatypes.  Tom hypothesized that most of the
benefit was in the first idea, and the numbers that Peter posted seem
to support that conclusion.  The first idea is also less invasive and
more broadly applicable, so to me that seems like the first thing to
pursue.

Now, that doesn't mean that we shouldn't consider the second idea as
well, but I don't believe that the evidence presented thus far is
sufficient to prove that we should go that route.  It seems entirely
possible that inlining any non-trivial comparator function could work
out to a loss, or that the exact choice of compiler flags could affect
whether inlining works out to a plus or a minus (what happens with -O3
vs. -O2?  what about -O1?  what about -O2 -fno-omit-frame-pointer?
what if they're using HP's aCC or Intel's icc rather than gcc?).
There's no point in installing an optimization that could easily be a
pessimization on some other workload, and that hasn't been tested, or
at least no results have been posted here.  On the other hand,
matching the calling convention of the comparator function to what
qsort() wants and eliminating the trampoline seems absolutely certain
to be a win in every case, and based on the work Peter has done it
seems like it might be a quite large win.

In fact, you have to ask yourself just exactly how much our
function-calling convention is costing us in general.  We use that
mechanism an awful lot and whatever loss of cycles is involved would
be spread all over the code base where oprofile or gprof won't easily
be able to pick it out.  Even the cost at any particular call site
will be split between the caller and the callee.  There might be more
stuff we could optimize here than just sorting...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RangeVarGetRelid()

2011-11-18 Thread Noah Misch

On Thu, Nov 17, 2011 at 11:49:06PM -0500, Robert Haas wrote:
> On Thu, Nov 17, 2011 at 10:48 PM, Noah Misch  wrote:
> > On Thu, Nov 17, 2011 at 08:59:58PM -0500, Robert Haas wrote:
> >> --- a/src/include/catalog/namespace.h
> >> +++ b/src/include/catalog/namespace.h
> >> @@ -47,9 +47,15 @@ typedef struct OverrideSearchPath
> >> ? ? ? bool ? ? ? ? ? ?addTemp; ? ? ? ? ? ? ? ?/* implicitly prepend temp 
> >> schema? */
> >> ?} OverrideSearchPath;
> >>
> >> +typedef void (*RangeVarGetRelidCallback)(const RangeVar *relation, Oid 
> >> relId,
> >> + ? ? Oid oldRelId);
> >>
> >> -extern Oid ? RangeVarGetRelid(const RangeVar *relation, LOCKMODE lockmode,
> >> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?bool missing_ok, bool nowait);
> >> +#define RangeVarGetRelid(relation, lockmode, missing_ok, nowait) \
> >> + ? ? ? ? ? ? RangeVarGetRelidExtended(relation, lockmode, missing_ok, 
> >> nowait, NULL)
> >> +
> >> +extern Oid ? RangeVarGetRelidExtended(const RangeVar *relation,
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?LOCKMODE lockmode, bool 
> >> missing_ok, bool nowait,
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?RangeVarGetRelidCallback 
> >> callback);
> >
> > I like the split of RangeVarGetRelidExtended from RangeVarGetRelid. ?Shall 
> > we
> > also default missing_ok=false and nowait=false for RangeVarGetRelid?
> 
> I thought about that, but just did it this way for now to keep the
> patch small for review purposes.

Fair enough.

> nowait = true is only used in one
> place, so it probably makes sense to default it to false in the
> non-extended version.  But there are a number of callers who want
> missing_ok = true behavior, so I'm inclined not to think that one
> should be available in the non-extended version.

[assuming the "not" in your last sentence was unintended]

I count 1/25 callers overriding nowait and 3/25 overriding missing_ok.  So, it's
looking like a less-common override than the callback function will come to be.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] vpath builds and verbose error messages

2011-11-18 Thread Alvaro Herrera


Excerpts from Peter Eisentraut's message of vie nov 18 01:34:18 -0300 2011:
> When using verbose error messages (psql \set VERBOSITY verbose) with a
> vpath build, you get this sort of thing:
> 
> ERROR:  42703: column "foo" does not exist
> LINE 1: select foo;
>^
> LOCATION:  transformColumnRef, 
> /build/buildd-postgresql-9.1_9.1.1-3-i386-AP0ovQ/postgresql-9.1-9.1.1/build/../src/backend/parser/parse_expr.c:766
> 
> Can we shorten that path somehow?  Either in the C code when printing it
> out, or during the build.  Any ideas?

Huh, I hadn't noticed, even though I see those all the time.  I agree
with shortening the path so that it's relative to the root source dir,
if that's what you propose:

LOCATION:  transformColumnRef, src/backend/parser/parse_expr.c:766

If it can be done at build time that would be best, because we wouldn't
be carrying thousands of useless copies of the source path ...  I have
no suggestions on how to do this, however.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Review for "Add permission check on SELECT INTO"

2011-11-18 Thread Albe Laurenz

The patch is in context diff format and applies cleanly.

The functionality is needed because it keeps users from circumventing
privilege restrictions, as they can currently do in this case.

There is no documentation, which I think is OK since it changes
behaviour to work as documented.

The patch compiles without warning.

It contains a test case, but that test case does not test
the feature at all!

The expected (and encountered) result is:

CREATE SCHEMA selinto_schema;
CREATE USER selinto_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA selinto_schema
 REVOKE INSERT ON TABLES FROM selinto_user;
SET SESSION AUTHORIZATION selinto_user;
SELECT * INTO TABLE selinto_schema.tmp1
 FROM onek WHERE onek.unique1 < 2; -- Error
ERROR:  permission denied for relation onek
RESET SESSION AUTHORIZATION;
DROP SCHEMA selinto_schema CASCADE;
DROP USER selinto_user;

This does not fail because "selinto_user" lacks INSERT permission
on "selinto_schema.tmp1" (he doesn't!), but because he lacks
SELECT permission on "onek" (as the error message indicates).
Setting default privileges on a schema can never revoke
default privileges.


The patch works as advertised in that it causes SELECT ... INTO
and CREATE TABLE ... AS to fail, however the error message is
misleading. Here a (correct) test case:

CREATE ROLE laurenz LOGIN;
CREATE TABLE public.test(id, val) AS VALUES (1, 'eins'), (2, 'zwei');
GRANT SELECT ON public.test TO laurenz;
ALTER DEFAULT PRIVILEGES FOR ROLE laurenz REVOKE INSERT ON TABLES FROM
laurenz;
SET ROLE laurenz;
CREATE TABLE public.copy1(a, b) AS SELECT * FROM public.test;
ERROR:  whole-row update is not implemented
CREATE TABLE public.copy1(a) AS SELECT id FROM public.test;
ERROR:  whole-row update is not implemented
SELECT * INTO public.copy2 FROM public.test;
ERROR:  whole-row update is not implemented
RESET ROLE;
ALTER DEFAULT PRIVILEGES FOR ROLE laurenz GRANT ALL ON TABLES TO
laurenz;
DROP TABLE test;
DROP ROLE laurenz;

The correct error message would be:
ERROR:  permission denied for relation copy1

It seems like a wrong code path is taken in ExecCheckRTEPerms,
maybe there's something wrong with rte->modifiedCols.


I'll mark the patch as "Waiting on Author" until these problems are
addressed.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] range_adjacent and discrete ranges

2011-11-18 Thread Florian Pflug

On Nov18, 2011, at 09:25 , Jeff Davis wrote:
> While thinking about range_cmp_bounds, I started to think that the way
> range_adjacent works is wrong.
> 
> range_adjacent() depends on the bounds of two ranges to match up, such
> that the boundary values are equal, but one is exclusive and the other
> inclusive, and one is a lower bound and the other an upper bound.

> That makes perfect sense for continuous ranges because that is the only
> way they can possibly be adjacent. It also works for the discrete ranges
> as defined so far, because they use a canonical function that translates
> the values to [) form. But if someone were to write a canonical function
> that translates the ranges to [] or () form, range_adjacent would be
> useless.

Hm, the problem here is for range_adjacent to recognize that [1,2] is
adjacent to [3,4] when treated as integer ranges, but that they're not
adjacent when treated as float ranges. The reason being, of course, that
there's isn't any integer between 2 and 3, but there are floats between
2 and 3.

That information, however, *is* already contained in the canonical
functions, because those function know that (2,3) are empty as an integer
range, but non-empty as a float range.

For example, [1,2] is adjacent to [3,4] as integer ranges because (2,3)
is empty as an integer range. Conversely, since (2,3) is *not* empty as a
float range, [1,2] and [3,4] are *not* adjacent as float ranges.

More formally, let there be two arbitrary ranges
  a,b,i_a,i_b
  c,d,i_c,i_d
where the first two parameters are the lower respectively upper bound, and
the last two are booleans saying whether the lower respectively upper bound
is inclusive (true) or exclusive (false).

These ranges are then adjacent exactly if the range
  b,c,!i_b,!i_c
is empty.

This definition does not depend on any specific canonical form of ranges,
only on the canonicalize function's ability to detect empty ranges.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Collecting statistics on CSV file data

2011-11-18 Thread Shigeru Hanada

(2011/11/18 16:25), Etsuro Fujita wrote:
> Thank you for your testing.  I updated the patch according to your
> comments.  Attached is the updated version of the patch.

I'd like to share result of my review even though it's not fully
finished.  So far I looked from viewpoint of API design, code
formatting, and documentation.  I'll examine effectiveness of the patch
and details of implementation next week, and hopefully try writing
ANALYZE handler for pgsql_fdw :)

New patch has correct format, and it could be applied to HEAD of master
branch cleanly.  All regression tests including those of contrib modules
have passed.  It contains changes of codes and regression tests related
to the issue, and they have enough comments.  IMO the document in this
patch is not enough to show how to write analyze handler to FDW authors,
but it can be enhanced afterward.  With this patch, FDW author can
provide optional ANALYZE handler which updates statistics of foreign
tables.  Planner would be able to generate better plan by using statistics.

> Yes.  But in the updated version, I've refactored analyze.c a little bit
> to allow FDW authors to simply call do_analyze_rel().

> The updated version enables FDW authors to just write their own
> acquire_sample_rows().  On the other hand, by retaining to hook
> AnalyzeForeignTable routine at analyze_rel(), higher level than
> acquire_sample_rows() as before, it allows FDW authors to write
> AnalyzeForeignTable routine for foreign tables on a remote server to ask
> the server for its current stats instead, as pointed out earlier by Tom
> Lane.

IIUC, this patch offers three options to FDWs: a) set
AnalyzeForeignTable to NULL to indicate lack of capability, b) provide
AnalyzeForeignTable which calls do_analyze_rel with custom
sample_row_acquirer, and c) create statistics data from scratch by FDW
itself by doing similar things to do_analyze_rel with given argument or
copying statistics from remote PostgreSQL server.

ISTM that this design is well-balanced between simplicity and
flexibility.  Maybe these three options would suit web-based wrappers,
file-based or RDBMS wrappers, and pgsql_fdw respectively.  I think that
adding more details of FdwRoutine, such as purpose of new callback
function and difference from required ones, would help FDW authors,
including me :)

I have some random comments:

- I think separated typedef of sample_acquire_rows would make codes more
readable.  In addition, parameters of the function should be described
explicitly.
- I couldn't see the reason why file_fdw sets ctid of sample tuples,
though I guess it's for Vitter's random sampling algorithm.  If every
FDW must set valid ctid to sample tuples, it should be mentioned in
document of AnalyzeForeignTable.  Exporting some functions from
analyze.c relates this issue?
- Why file_fdw skips sample tuples which have NULL value?  AFAIS
original acquire_sample_rows doesn't do so.
- Some comment lines go past 80 columns.
- Some headers included in file_fdw.c seems unnecessary.

Regards,
-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FlexLocks

2011-11-18 Thread Pavan Deolasee

On Thu, Nov 17, 2011 at 10:19 AM, Pavan Deolasee
 wrote:
> On Thu, Nov 17, 2011 at 10:01 AM, Robert Haas  wrote:
>
>>
>> I am not convinced that that's a better API.  I mean, consider
>> something like this:
>>
>>    /*
>>     * OK, let's do it.  First let other backends know I'm in ANALYZE.
>>     */
>>    LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>>    MyProc->vacuumFlags |= PROC_IN_ANALYZE;
>>    LWLockRelease(ProcArrayLock);
>
>> I'm not sure exactly how you'd proposed to rewrite that, but I think
>> it's almost guaranteed to be more than three lines of code.
>
> I would guess the ReqRes will look something like this where
> ReqResRequest/Response would probably be union of all various requests
> and responses, one for each type of request:
>
> struct ReqRes {
>  ReqResRequestType  reqtype;
>  ReqResRequest         req;
>  ReqResResponse      res;
> }
>
> The code above can be rewritten as:
>
> reqRes.reqtype = RR_PROC_SET_VACUUMFLAGS;
> reqRes.req.set_vacuumflags.flags =  PROC_IN_ANALYZE;
> LWLockExecute(ProcArrayLock, LW_EXCLUSIVE, &reqRes);
>

My apologies for hijacking the thread, but the work seems quite
related, so I thought I should post here instead of starting a new
thread.

Here is a WIP patch based on the idea of having a shared Q. A process
trying to access the shared memory protected by a LWLock, sets up the
task in its PGPROC and calls a new API LWLockExecute(). If the LWLock
is available, the task is performed immediately and the function
returns. Otherwise, the process queues up itself on the lock. When the
last shared lock holder or the exclusive lock holder call
LWLockRelease(), it scans through such pending tasks, executes them
via a callback mechanism and wakes all those processes along with any
other normal waiter(s) waiting on LWLockAcquire().

I have only coded for ProcArrayEndTransaction, but it should fairly
easy to extend the usage at some more places, especially those which
does some simple modifications to the protected area. I don't propose
to use the technique for every user of LWLock, but there can be some
obvious candidates, including this one that Robert found out.

I see 35-40% improvement for 32-80 clients on a 5 minutes pgbench -N
run with scale factor of 100 and permanent tables. This is on a
32-core HP IA box.

There are few things that need some deliberations. The pending tasks
are right now executed while holding the mutex (spinlock). This is
good and bad for obvious reasons. We can possibly change that so that
the work is done without holding the spinlock or leave to the caller
to choose the behavior. Doing it without holding the spinlock will
make the technique interesting for many more callers. We can also
rework the task execution so that pending similar requests from
multiple callers can be combined and executed with a single callback,
if the caller knows its safe to do so. I haven't thought through the
API/callback changes to support that, but its definitely possible and
could be quite useful in many cases. For example, status of many
transactions can be checked with a single lookup of the ProcArray. Or
WAL inserts from multiple processes can be combined and written at
once.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com
commit 24f8e349d085e646cb918c552cc8ead7d38f7013
Author: Pavan Deolasee 
Date:   Fri Nov 18 15:49:54 2011 +0530

Implement a shared work Q mechanism. A process can queue its work for later
execution if the protecting lock is currently not available. Backend which
releases the last lock will finish the work and wake up the waiting process.

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 1a48485..59d2958 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -157,6 +157,7 @@ static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray,
 			   TransactionId xmax);
 static TransactionId KnownAssignedXidsGetOldestXmin(void);
 static void KnownAssignedXidsDisplay(int trace_level);
+static bool ProcArrayEndTransactionWQ(WorkQueueData *wqdata);

 /*
  * Report shared-memory space needed by CreateSharedProcArray.
@@ -331,8 +332,6 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)

 	elog(LOG, "failed to find proc %p in ProcArray", proc);
 }
-
-
 /*
  * ProcArrayEndTransaction -- mark a transaction as no longer running
  *
@@ -352,33 +351,24 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 	if (TransactionIdIsValid(latestXid))
 	{
 		/*
-		 * We must lock ProcArrayLock while clearing proc->xid, so that we do
-		 * not exit the set of "running" transactions while someone else is
-		 * taking a snapshot.  See discussion in
-		 * src/backend/access/transam/README.
+		 * Use the shared work queue mechanism to get the work done. If the
+		 * ProcArrayLock is available, it will done immediately, otherwise it
+		 * will be queued up and some other backend (the one who releases the
+		 * lock las

[HACKERS] proposal: better support for debugging of overloaded functions

2011-11-18 Thread Pavel Stehule

Hello

I am missing a some unique identifier in exception info. I would to
use it in combination with \sf statement

I have a log

WARNING:  NP_CPS: a cannot to build a RSLT object
DETAIL:  dsql_text: SELECT * FROM
public._npacceptflatfile(order_nr:=to_number('O0032',
'O')::int,sequence_nr:=1,ref_sequence_nr:=2,recipient_op:=201,losing_op:=303)
message: cannot to identify real type for record type variable
CONTEXT:  PL/pgSQL function "assign_rslts" line 50 at assignment
SQL statement "SELECT workflow.assign_rslts('2011-12-18',
   '09:30:30',
   to_operator := 201,
   from_operator := 303)"
PL/pgSQL function "inline_code_block" line 855 at PERFORM

and I would to look on "assign_rslts" function, but
ohs=# \sf workflow.assign_rslts
ERROR:  more than one function named "workflow.assign_rslts"

and I have to find a parameters and manually build a parameters list.
My proposal is enhancing l CONTEXT line about function's oid and
possibility to use this oid in \sf and \df function

some like

CONTEXT:  PL/pgSQL function "assign_rslts" line 50 at assignment (oid: 65903)

...

\sf+ 65903

This simple feature can reduce a time that is necessary to identify a
bug in overloaded functions.

Other possibility is just enhancing context line to be more friendly
to \sf statement

CONTEXT: PL/pgSQL function workflow.assign_rslts(date,time without
time zone,operatorid_type,operatorid_type)"" line 50 at assignment

But this is not too readable and it implementation is harder, because
in exception time is not access to system tables - so this string
should be cached somewhere.

Notes, ideas??

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Core Extensions relocation

2011-11-18 Thread Greg Smith


On 11/17/2011 03:18 PM, Peter Eisentraut wrote:

Who's to say that after this, the core extensions won't end up in a new
separate package postgresql-extensions (or similar) or might even stay
in postgresql-contrib, for compatibility?
   


I don't know why packagers would make an active decision that will make 
their lives more difficult, with no benefit to them and a regression 
against project recommendations for their users.  The last thing anyone 
packaging PostgreSQL wants is more packages to deal with; there are 
already too many.  Each of the current sub-packages has a legitimate 
technical or distribution standard reason to exist--guidelines like 
"break out client and server components" or "minimize the package 
dependencies for the main server".  I can't think of any good reason 
that would inspire the sort of drift you're concerned about.


There's little compatibility argument beyond consistency with the 
previous packages.  The reason why this is suddenly feasible now is that 
the under the hood change are almost all hidden by CREATE EXTENSION.


And if some wanted to wander this way, they'll end up having to maintain 
a doc patch to address the fact that they've broken with project 
recommendations.  This text in what I submitted will no longer be true:


"This appendix contains information regarding core extensions that are 
built and included with a standard installation of PostgreSQL."


One of the reasons I picked the name I did was to contrast with the 
existing description of contrib:


"porting tools, analysis utilities, and plug-in features that are not 
part of the core PostgreSQL system, mainly because they address a 
limited audience or are too experimental to be part of the main source 
tree."


That says it's perfectly fine to make these optional in another 
package--they're not "part of the core".  That scary wording is 
practically telling packagers to separate them, so it's easy to keep the 
experimental stuff away from the production quality components.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inlining comparators as a performance optimisation

2011-11-18 Thread Simon Riggs

On Fri, Nov 18, 2011 at 5:20 AM, Robert Haas  wrote:

> I think that we should really consider doing with this patch what Tom
> suggested upthread; namely, looking for a mechanism to allow
> individual datatypes to offer up a comparator function that doesn't
> require bouncing through FunctionCall2Coll().  It seems to me that
> duplicating the entire qsort() algorithm is a little iffy.  Sure, in
> this case it works out to a win.  But it's only a small win -
> three-quarters of it is in the uncontroversial activity of reducing
> the impedance mismatch - and how do we know it will always be a win?
> Adding more copies of the same code can be an anti-optimization if it
> means that a smaller percentage of it fits in the instruction cache,
> and sometimes small changes in runtime are caused by random shifts in
> the layout of memory that align things more or less favorably across
> cache lines rather than by real effects.  Now it may well be that this
> is a real effect, but will it still look as good when we do this for
> 10 data types?  For 100 data types?
>
> In contrast, it seems to me that reducing the impedance mismatch is
> something that we could go and do across our entire code base, and
> every single data type would benefit from it.  It would also be
> potentially usable by other sorting algorithms, not just quick sort.

I don't think its credible to implement that kind of generic
improvement at this stage of the release cycle. That has a much bigger
impact since it potentially effects all internal datatypes and
external ones also. Definitely a longer term way forward.

If individual datatypes offer up a comparator function that is easily
going to result in more code than is being suggested here. So the
argument about flooding the CPU cache works against your alternate
proposal, not in favour of it.

We have no proof that we need to do this for 10 or 100 data types. We
only currently have proof that there is gain for the most common
types. Of course, it sounds like it might be useful to allow any data
type to gain an advantage, but we shouldn't be blind to the point that
almost nobody will use such a facility, and if they do the code won't
be written for a long time yet. If this came as a request from custom
datatype authors complaining of slow sorts it would be different, but
it didn't so we don't even know if anybody would ever write user
defined comparator routines. Rejecting a patch because of a guessed
user requirement is not good.

Peter's suggested change adds very few lines of code and those compile
to some very terse code, a few hundred instructions at very most.
Requesting an extra few cachelines to improve qsort by so much is
still an easy overall win.

The OP change improves qsort dramatically, and is a small, isolated
patch. There is no significant downside. We also have it now, so lets
commit this, chalk up another very good performance improvement and
use our time on something else this commitfest, such as the flexlocks
idea.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] range_adjacent and discrete ranges

2011-11-18 Thread Jeff Davis

While thinking about range_cmp_bounds, I started to think that the way
range_adjacent works is wrong.

range_adjacent() depends on the bounds of two ranges to match up, such
that the boundary values are equal, but one is exclusive and the other
inclusive, and one is a lower bound and the other an upper bound.

That makes perfect sense for continuous ranges because that is the only
way they can possibly be adjacent. It also works for the discrete ranges
as defined so far, because they use a canonical function that translates
the values to [) form. But if someone were to write a canonical function
that translates the ranges to [] or () form, range_adjacent would be
useless.

There are a few approaches to solving it:

1. Make the canonical function accept a "format" argument much like the
constructor, and have an output format stored in the catalog that would
be passed in under most circumstances. However, range_adjacent could
pass in its own form that would be useful for its purposes.

2. Replace the canonical function with "inc" and "dec" functions, or
some variation thereof. We'd still need some kind of output format to
match the current behavior, otherwise every range would always be output
in [) form (I don't necessarily object to that, but it would be a
behavior difference for user-defined range types).

3. Remove range_adjacent.

4. Do nothing, and document that the canonical function should translate
to either [) or (] if you want range_adjacent to work.

Thoughts?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Are range_before and range_after commutator operators?

2011-11-18 Thread Jeff Davis

On Thu, 2011-11-17 at 17:10 -0500, Tom Lane wrote:
> Applied, thanks.  These comments aren't quite what I'd hoped for though.
> What I'm lacking is the conceptual context, ie, why is a
> less-equal-greater primitive for bounds a good thing?  It seems like
> when you consider the four possible directional (lower/upper)
> combinations, the same result from range_cmp_bounds means something
> different in each case, and I find that confusing.  I wonder whether
> functions defined along set-theoretic lines (this bound is strictly
> weaker or stronger than this other one, or these bounds together define
> an empty or singleton or non-singleton set) might be more transparent.

I think it comes down to how helpful it is to define higher-level
functions in terms of range_cmp_bounds versus some other function (or
set of functions).

range_cmp_bounds seemed to work out fairly well for most of the
operators, and it was the best that I came up with. The nice thing about
it is that it can compare a lower bound to another lower bound, or to an
upper bound. Then again, perhaps I tried to hard to unify those
concepts, and it just led to complexity?

I'm open to suggestion.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

75 matches

Mail list logo