[HACKERS] Re: Review of "pg_basebackup and pg_receivexlog to use non-blocking socket communication", was: Re: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2013-01-15 Thread Abhijit Menon-Sen
Hi.

This patch was marked "Needs review" with no reviewers in the ongoing
CF, so I decided to take a look at it. I see that Zoltan has posted a
review, so I've added him to the list.

But I took a look at the latest patch in any case. Here are some
comments, mostly cosmetic ones.

> diff -dcrpN postgresql.orig/doc/src/sgml/ref/pg_basebackup.sgml 
> postgresql/doc/src/sgml/ref/pg_basebackup.sgml
> *** postgresql.orig/doc/src/sgml/ref/pg_basebackup.sgml   2013-01-05 
> 17:34:30.742135371 +0100
> --- postgresql/doc/src/sgml/ref/pg_basebackup.sgml2013-01-07 
> 15:11:40.787007890 +0100
> *** PostgreSQL documentation
> *** 400,405 
> --- 400,425 
>
>   
>
> +   -r  class="parameter">interval
> +   --recvtimeout= class="parameter">interval
> +   
> +
> + time that receiver waits for communication from server (in seconds).
> +
> +   
> +   

I would reword this as "The maximum time (in seconds) to wait for data
from the server (default: wait forever)".

> +  
> +   -t  class="parameter">interval
> +   --conntimeout= class="parameter">interval
> +   
> +
> + time that client wait for connection to establish with server (in 
> seconds).
> +
> +   
> +

Likewise, "The maximum time (in seconds) to wait for a connection to the
server to succeed (default: wait forever)".

Same thing in pg_receivexlog.sgml. Also, there's trailing whitespace in
various places in these files (and elsewhere in the patch), which should
be fixed.

> diff -dcrpN postgresql.orig/src/bin/pg_basebackup/pg_basebackup.c 
> postgresql/src/bin/pg_basebackup/pg_basebackup.c
> *** postgresql.orig/src/bin/pg_basebackup/pg_basebackup.c 2013-01-05 
> 17:34:30.778135625 +0100
> --- postgresql/src/bin/pg_basebackup/pg_basebackup.c  2013-01-07 
> 15:16:24.610037886 +0100
> *** bool  streamwal = false;
> *** 45,50 
> --- 45,54 
>   boolfastcheckpoint = false;
>   boolwriterecoveryconf = false;
>   int standby_message_timeout = 10 * 1000;/* 10 
> sec = default */
> + int standby_recv_timeout = 60*1000; /* 60 sec = default */
> + char*standby_connect_timeout = NULL;

I don't really like standby_recv_timeout being an int and
standby_connect_timeout being a char *. I understand that it's so that
it can be assigned to "values[i]" in GetConnection(), but that reason is
very distant, and not obvious from this code at all.

That said, I don't know if it's really worth bothering with.

> + #define NAPTIME_PER_CYCLE 100   /* max sleep time between cycles 
> (100ms) */

This probably needs a better comment. Why are we sleeping between
cycles? What cycles?

> + printf(_("  -r, --recvtimeout=INTERVAL time that receiver waits for 
> communication from\n"
> +" server (in seconds)\n"));
> + printf(_("  -t, --conntimeout=INTERVAL time that client wait for 
> connection to establish\n"
> +" with server (in seconds)\n")); 

Same comments about wording apply, but perhaps there's no need to
mention the default.

> ! if (r == 0 || (r < 0 && errno == EINTR))
> ! {
> ! /*
> !  * Got a timeout or signal. Before Continuing 
> the loop, check for timeout.
> !  */
> ! if (standby_recv_timeout > 0)
> ! {
> ! now = localGetCurrentTimestamp();

I'd make "now" local to this block, and get rid of the comment. The two
"if"s are perfectly clear. This applies to the same pattern in other
places in the patch as well.

> ! if 
> (localTimestampDifferenceExceeds(last_recv_timestamp, now, 
> standby_recv_timeout))
> ! {
> ! fprintf(stderr, _("%s: 
> terminating DB File receive due to timeout\n"),

Better wording? "DB File receive" is confusing. Even something like
"Closing connection due to read timeout" would be better. Or perhaps
you can make it like the following message, slightly lower:

> ! if (PQconsumeInput(conn) == 0)
> ! {
> ! fprintf(stderr,
> ! _("%s: could not receive data 
> from WAL Sender: %s"),
> ! progname, PQerrorMessage(conn));

…and in the former case, say "read timeout" instead of PQerrorMessage().

> ! /* Set the last reply timestamp */
> ! last_recv_timestamp = localGetCurrentTimestamp();
> ! 
> ! /* Some data is received, so go back read them in 
> buf

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Tom Lane
Alvaro Herrera  writes:
> There are still 34 items needing attention in CF3.  I suggest that, if
> you have some spare time, your help would be very much appreciated
> there.  The commitfest that started on Jan 15th has 65 extra items.
> Anything currently listed in CF3 can rightfully be considered to be part
> of CF4, too.

In case you hadn't noticed, we've totally lost control of the CF
process.  Quite aside from the lack of progress on closing CF3, major
hackers who should know better are submitting significant new feature
patches now, despite our agreement in Ottawa that nothing big would be
accepted after CF3.  At this point I'd bet against releasing 9.3 during
2013.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Curious buildfarm failures (fwd)

2013-01-15 Thread Tom Lane
It's a compiler bug.

icc 11.1 apparently thinks that this loop in doPickSplit:

/*
 * Update nodes[] array to point into the newly formed innerTuple, so that
 * we can adjust their downlinks below.
 */
SGITITERATE(innerTuple, i, node)
{
nodes[i] = node;
}

is going to be iterated hundreds of times, because it expends a
ridiculous amount of effort on it, starting with a loop prolog that
prefetches about a thousand bytes starting at the nodes pointer :-(
(Why does it think it needs to prefetch an array it's only going to
write into?  Is IA64's cache hardware really that stupid?)
And it makes use of IA64's bizarre scheme for software-unrolling
loops, which I am going to do my darnedest to forget now that I've
learned it; but for the purposes of this bit you only need to know
that the br.wtop.dptk instruction "renames" registers 32 and up,
so that whatever is in r32 when the bottom of the loop is reached
will be in r33 on the next iteration, and r33's contents move to r34,
etc.  In this particular example, this ridiculous complication saves
a grand total of no instructions, but nevermind that.

Before starting the loop, the code has computed

r28 = innerTuple
r29 = nodes
r26 = r29 + 1200 (this is where it will continue the prefetching...)
r33 = 0
r35 = innerTuple + innerTuple->prefixSize + 8 (ie, the initial value of "nt")
r27 = innerTuple + innerTuple->prefixSize + 8 + 6

And the body of the SGITERATE loop looks like

.b4_110: 
at top of loop, r35 contains "nt" pointer, r33 contains "i"
 (p17)  st8 [r29]=r35,8 //0: {940:3} 4456 0
store nt at *r29, increment r29 by 8 bytes (thus, assign to nodes[i])
 (p17)  add r32=1,r33   //0: {938:2} 4453 0
compute i+1, will be next value of i due to register rename
 (p17)  ld2 r36=[r28]   //1: {938:2} 4462 0
fetch first 2 bytes of innerTuple
 (p17)  ld2 r34=[r27],r33   //1: {938:2} 4459 0
fetch last 2 bytes of node tuple, on first iteration anyway ...
and then add the value of r33 to r27, which is all wrong
 (p17)  extr.u  r37=r36,3,13//2: {938:2} 4463 0
extract nNodes from fetched 2 bytes of innerTuple
 (p17)  extr.u  r33=r34,0,13 ;; //2: {938:2} 4460 0
extract size field of node tuple, or so it hopes
 (p17)  lfetch.nt1  [r26],8 //3: {938:2} 4454 0
useless prefetch more than a thousand bytes away from the action
 (p17)  cmp4.lt p16,p0=r32,r37  //3: {938:2} 4464 0
compare whether r32 (next value of i) < nNodes
 (p17)  add r34=r35,r33 //3: {938:2} 4461 0
set r34 (next value of r35) to r35 + size field, or so it hopes
 (p16)  br.wtop.dptk.b4_110 ;;  //3: {938:2} 4465 0
rename the registers and do it again, if the cmp4 returned true

The problem with this code is that r27, which ought to be always equal
to r35 + 6, is incremented by the wrong amount in the second ld2
instruction, namely by the "i" counter.  The value that *should* get
added to it is the node size field, ie the same value that's loaded into
r33 below that and then added to r35 in the last add instruction (and
then stored into r34, which is about to become r35).  So I think the
compiler has outsmarted itself as to which rotating register contains
which value when.

The result of this breakage is that the set of node pointers computed by
the loop is totally wrong for all values after the first.  This means
the later loop that's trying to insert the now-known downlink TIDs into
the innerTuple's nodes is storing those TIDs into random locations, and
thus tromping all over memory.  The case where we get a reproducible
crash is where the Asserts in that loop notice that what's at the
pointed-to addresses isn't what's expected, before we manage to clobber
anything critical.

Diagnosis: icc 11.1 is not ready for prime time.

I shall now retire with a glass of wine and attempt to forget everything
I just learned about IA64.  What a bizarre architecture ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump transaction's read-only mode

2013-01-15 Thread Pavan Deolasee
On Wed, Jan 9, 2013 at 6:42 AM, Gurjeet Singh wrote:

>
>>
> I have updated the commitfest submission to link to the correct patch
> email.
>
>
Thanks Gurjeet.


> I initially thought that this patch deserves accompanying documentation
> because pg_dump's serializable transaction may error out because of a
> conflict. But the following line in the docs [1] confirms otherwise:
>
> "read-only transactions will never have serialization conflicts"
>
> So no doc patch necessary :)
>
>
Can you please mark the patch as "Ready for committer" if you think that
way ?

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Jeff Janes
On Tuesday, January 15, 2013, Simon Riggs wrote:

> On 15 January 2013 22:55, Bruce Momjian >
> wrote:
>
> >> Why is this being discussed now?
> >
> > It is for 9.4 and will take months.  I didn't think there was a better
> > time.  We don't usually discuss features during beta testing.
>
> Bruce, there are many, many patches on the queue. How will we ever get
> to beta testing if we begin open ended discussions on next release?
>
> If we can't finish what we've started for 9.3, why talk about 9.4?
>
> Yes, its a great topic for discussion, but there are better times.
>

Possibly so.  But unless we are to introduce a "thinkfest", how do we know
when such a better time would be?

Lately commit-fests have been basically a continuous thing, except during
beta which would be an even worse time to discuss it.  It think that
parallel execution is huge and probably more likely for 9.5 (10.0?) than
9.4 for the general case (maybe some special cases for 9.4, like index
builds).  Yet the single biggest risk I see to the future of the project is
the lack of parallel execution.

Cheers,

Jeff


Re: [HACKERS] string escaping in tutorial/syscat.source

2013-01-15 Thread Josh Kupershmidt
On Tue, Jan 15, 2013 at 6:35 PM, Jeff Janes  wrote:

> Do you propose back-patching this?  You could argue that this is a bug in
> 9.1 and 9.2.   Before that, they generate deprecation warnings, but do not
> give the wrong answer.

I think that backpatching to 9.1 would be reasonable, though I won't
complain if the fix is only applied to HEAD.

> If it is only to be applied to HEAD, or only to 9.1, 9.2, and HEAD, then
> this part seems to be unnecessary and I think should be removed (setting a
> value to its default is more likely to cause confusion than remove
> confusion):
>
> SET standard_conforming_strings TO on;
>
> and the corresponding reset as well.

Well, it may be unnecessary for people who use the modern default
standard_conforming_strings. But some people have kept
standard_conforming_strings=off in recent versions because they have
old code which depends on this. And besides, it seems prudent to make
the dependency explicit, rather than just hoping the user has the
correct setting, and silently giving wrong results if not. Plus being
able to work with old server versions.

> Other than that, it does what it says (fix a broken example), does it
> cleanly, we want this, and I have no other concerns.
>
> I guess the src/tutorial directory could participate in regression tests, in
> which case this problem would have been detected when introduced, but I
> don't think I can demand that you invent regression tests for a feature you
> are just fixing rather than creating.

Yeah, I don't think tying into the regression tests is warranted here.

Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Alvaro Herrera
Bruce Momjian escribió:
> On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:
> > 
> > 
> > On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus  wrote:
> > 
> > Claudio, Stephen,
> > 
> > It really seems like the areas where we could get the most "bang for the
> > buck" in parallelism would be:
> > 
> > 1. Parallel sort
> > 2. Parallel aggregation (for commutative aggregates)
> > 3. Parallel nested loop join (especially for expression joins, like GIS)
> > 
> > parallel data load? :/
> 
> We have that in pg_restore, and I thinnk we are getting parallel dump in
> 9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest. 
> Is it still being worked on?

It's in the previous-to-last commitfest.  IIRC that patch required
review and testing from people with some Windows background.

There are still 34 items needing attention in CF3.  I suggest that, if
you have some spare time, your help would be very much appreciated
there.  The commitfest that started on Jan 15th has 65 extra items.
Anything currently listed in CF3 can rightfully be considered to be part
of CF4, too.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Claudio Freire
On Wed, Jan 16, 2013 at 12:55 AM, Stephen Frost  wrote:
>> If memory serves me correctly (and it does, I suffered it a lot), the
>> performance hit is quite considerable. Enough to make it "a lot worse"
>> rather than "not as good".
>
> I feel like we must not be communicating very well.
>
> If the CPU is pegged at 100% and the I/O system is at 20%, adding
> another CPU at 100% will bring the I/O load up to 40% and you're now
> processing data twice as fast overall

Well, there's the fault in your logic. It won't be as linear. Adding
another sequential scan will decrease bandwidth, if the I/O system was
doing say 10MB/s at 20% load, now it will be doing 20MB/s at 80% load
(maybe even worse). Quite suddenly you'll meet diminishing returns,
and the I/O subsystem which wasn't the bottleneck will become it,
bandwidth being the key. You might end up with less bandwidth than
you've started, if you go far enough past that knee.

Add some concurrent operations (connections) to the mix and it just gets worse.

Figuring out where the knee is may be the hardest problem you'll face.
I don't think it'll be predictable enough to make I/O parallelization
in that case worth the effort.

If you instead think of parallelizing random I/O (say index scans
within nested loops), that might work (or it might not). Again it
depends a helluva lot on what else is contending with the I/O
resources and how far ahead of optimum you push it. I've faced this
problem when trying to prefetch on index scans. If you try to prefetch
too much, you induce extra delays and it's a bad tradeoff.

Feel free to do your own testing.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Michael Paquier
On Wed, Jan 16, 2013 at 1:32 PM, Bruce Momjian  wrote:

> On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:
> >
> >
> > On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus  wrote:
> >
> > Claudio, Stephen,
> >
> > It really seems like the areas where we could get the most "bang for
> the
> > buck" in parallelism would be:
> >
> > 1. Parallel sort
> > 2. Parallel aggregation (for commutative aggregates)
> > 3. Parallel nested loop join (especially for expression joins, like
> GIS)
> >
> > parallel data load? :/
>
> We have that in pg_restore, and I thinnk we are getting parallel dump in
> 9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest.
> Is it still being worked on?
>
Not exactly, I meant something like being able to use parallel processing
when doing INSERT or COPY directly in core. If there is a parallel
processing infrastructure, it could also be used for such write operations.
I agree that the cases mentioned by Josh are far more appealing though...
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:
> 
> 
> On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus  wrote:
> 
> Claudio, Stephen,
> 
> It really seems like the areas where we could get the most "bang for the
> buck" in parallelism would be:
> 
> 1. Parallel sort
> 2. Parallel aggregation (for commutative aggregates)
> 3. Parallel nested loop join (especially for expression joins, like GIS)
> 
> parallel data load? :/

We have that in pg_restore, and I thinnk we are getting parallel dump in
9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest. 
Is it still being worked on?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Michael Paquier
On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus  wrote:

> Claudio, Stephen,
>
> It really seems like the areas where we could get the most "bang for the
> buck" in parallelism would be:
>
> 1. Parallel sort
> 2. Parallel aggregation (for commutative aggregates)
> 3. Parallel nested loop join (especially for expression joins, like GIS)
>
parallel data load? :/
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Josh Berkus
Claudio, Stephen,

It really seems like the areas where we could get the most "bang for the
buck" in parallelism would be:

1. Parallel sort
2. Parallel aggregation (for commutative aggregates)
3. Parallel nested loop join (especially for expression joins, like GIS)


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Stephen Frost
* Josh Berkus (j...@agliodbs.com) wrote:
> Actually, thanks to much faster storage (think SSD, SAN), it's easily
> possible for PostgreSQL to become CPU-limited on a seq scan query, even
> when reading from disk.

Particularly with a complex filter being applied or if it's feeding into
something above that's expensive..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Josh Berkus

>> but there will be
>> quite a few cases where it's much, much better.
> 
> Just cached segments.

Actually, thanks to much faster storage (think SSD, SAN), it's easily
possible for PostgreSQL to become CPU-limited on a seq scan query, even
when reading from disk.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Stephen Frost
* Claudio Freire (klaussfre...@gmail.com) wrote:
> On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost  wrote:
> > Sequentially scanning the *same* data over and over is certainly
> > counterprouctive.  Synchroscans fixed that, yes.  That's not what we're
> > talking about though- we're talking about scanning and processing
> > independent sets of data using multiple processes.
> 
> I don't see the difference. Blocks are blocks (unless they're cached).

Not quite.  Having to go out to the kernel isn't free.  Additionally,
the seq scans used to pollute our shared buffers prior to
synch-scanning, which didn't help things.

> >  It's certainly
> > possible that in some cases that won't be as good
> 
> If memory serves me correctly (and it does, I suffered it a lot), the
> performance hit is quite considerable. Enough to make it "a lot worse"
> rather than "not as good".

I feel like we must not be communicating very well.

If the CPU is pegged at 100% and the I/O system is at 20%, adding
another CPU at 100% will bring the I/O load up to 40% and you're now
processing data twice as fast overall.  If you're running a single CPU
at 20% and your I/O system is at 100%, then adding another CPU isn't
going to help and may even degrade performance by causing problems for
the I/O system.  The goal of the optimizer will be to model the plan to
account for exactly that, as best it can.

> > but there will be
> > quite a few cases where it's much, much better.
> 
> Just cached segments.

No, certainly not just cached segments.  Any situation where the CPU is
the bottleneck.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] log_lock_waits to identify transaction's relation

2013-01-15 Thread Stephen Frost
Simon,

* Simon Riggs (si...@2ndquadrant.com) wrote:
> Attached patch passes through further information about the lock wait,
> so we can display the following message instead
>LOG: process %d acquired %s on transaction %u on relation %u of
> database %u after %ld.%03d ms

I love this idea.  Please take these comments as an initial/quick review
because I'd really like to see this get in.

A couple quick notes regarding the patch- what does
GetXactLockTableRelid really provide..?  This patch does use it outside
of lmgr.c, where XactLockTableRelid is defined.  Second, do we really
need to describeXact bool to DescribeLockTag..?  Strikes me as
unnecessary- if the information is available, include it.

Lastly, I really don't like the changes made to XactLockTableWait() and
friends.  They had a very clearly defined and simple goal previously and
the additional parameter seems to muddy things a bit, particularly
without any comments about what it's all about or why it's there.  I
understand that you're trying to pass the necessary information down to
where the log is generated, but it doesn't feel quite right.  Also, what
about VirtualXactLockTableWait()..?  Should that have a similar
treatment?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Claudio Freire
On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost  wrote:
> * Claudio Freire (klaussfre...@gmail.com) wrote:
>> On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian  wrote:
>> > The 1GB idea is interesting.  I found in pg_upgrade that file copy would
>> > just overwhelm the I/O channel, and that doing multiple copies on the
>> > same device had no win, but those were pure I/O operations --- a
>> > sequential scan might be enough of a mix of I/O and CPU that parallelism
>> > might help.
>>
>> AFAIR, synchroscans were introduced because multiple large sequential
>> scans were counterproductive (big time).
>
> Sequentially scanning the *same* data over and over is certainly
> counterprouctive.  Synchroscans fixed that, yes.  That's not what we're
> talking about though- we're talking about scanning and processing
> independent sets of data using multiple processes.

I don't see the difference. Blocks are blocks (unless they're cached).

>  It's certainly
> possible that in some cases that won't be as good

If memory serves me correctly (and it does, I suffered it a lot), the
performance hit is quite considerable. Enough to make it "a lot worse"
rather than "not as good".

> but there will be
> quite a few cases where it's much, much better.

Just cached segments.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Stephen Frost
* Claudio Freire (klaussfre...@gmail.com) wrote:
> On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian  wrote:
> > The 1GB idea is interesting.  I found in pg_upgrade that file copy would
> > just overwhelm the I/O channel, and that doing multiple copies on the
> > same device had no win, but those were pure I/O operations --- a
> > sequential scan might be enough of a mix of I/O and CPU that parallelism
> > might help.
> 
> AFAIR, synchroscans were introduced because multiple large sequential
> scans were counterproductive (big time).

Sequentially scanning the *same* data over and over is certainly
counterprouctive.  Synchroscans fixed that, yes.  That's not what we're
talking about though- we're talking about scanning and processing
independent sets of data using multiple processes.  It's certainly
possible that in some cases that won't be as good, but there will be
quite a few cases where it's much, much better.

Consider a very complicated function running against each row which
makes the CPU the bottleneck instead of the i/o system.  That type of a
query will never run faster than a single CPU in a single-process
environment, regardless of if you have synch-scans or not, while in a
multi-process environment you'll take advantage of the extra CPUs which
are available and use more of the I/O bandwidth that isn't yet
exhausted.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] transforms

2013-01-15 Thread Peter Eisentraut
Here is an updated patch for the transforms feature.  The previous
discussion was here:
http://www.postgresql.org/message-id/1339713732.11971.79.ca...@vanquo.pezone.net

Old news: At the surface, this contains:

- New catalog pg_transform

- CREATE/DROP TRANSFORM

As proofs of concept and useful applications, I have included transforms for

- PL/Perl - hstore
- PL/Python - hstore
- PL/Python - ltree

New news: I have tried to address all issues raised during previous
reviews.

The regression tests for hstore_plpython and ltree_plpython don't pass
with Python 3.  This needs to be fixed, obviously, but it's an issue
unrelated to the core functionality.

As a demo that this can be used externally as well, I have written a
proof-of-concept transform between plpython and the mpz type in the pgmp
extension (multi-precision arithmetic extension):
https://github.com/petere/pgmp/tree/transforms/plpython

Note: There was a lot of churn lately in src/backend/commands/, and I
had to fix merge conflicts several times, so you need source as of about
right now to apply this patch cleanly.

Side issue/peculiarity: I had to change some shared library linking
flags for OS X.  Ordinarily, when linking a dynamically loadable module
on OS X, if there are unresolved symbols, it's an error.  But for
example, when we link hstore_plpython, we expect some symbols to be
resolved in hstore.so or plpythonu.so, so we need to ignore unresolved
symbols.  This change basically just makes OS X behave like other
platforms in this regard.  There might be other portability issues on
other platforms.



pg-transforms-20130115.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Claudio Freire
On Tue, Jan 15, 2013 at 7:46 PM, Tom Lane  wrote:
>> Compressing every small packet seems like it'd be overkill and might
>> surprise people by actually reducing performance in the case of lots of
>> small requests.
>
> Yeah, proper selection and integration of a compression method would be
> critical, which is one reason that I'm not suggesting a plugin for this.
> You couldn't expect any-random-compressor to work well.  I think zlib
> would be okay though when making use of its stream compression features.
> The key thing there is to force a stream buffer flush (too lazy to look
> up exactly what zlib calls it, but they have the concept) exactly when
> we're about to do a flush to the socket.  That way we get cross-packet
> compression but don't have a problem with the compressor failing to send
> the last partial message when we need it to.

Just a "stream flush bit" (or stream reset bit) on the packet header
would do. First packet on any stream would be marked, and that's it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] log_lock_waits to identify transaction's relation

2013-01-15 Thread Simon Riggs
When there is contention between concurrent transactions it shows up
as with log_lock_waits as
   LOG:  process %d acquired %s on %s transaction %u after %ld.%03d ms

Which is mostly useless for identifying and eliminating the contention
in the application since you can't tell which table is causing
problems. This only occurs for lock waits on transactions, not on
other lock types, such as tuples which are much more informative, for
example
   LOG:  process %d acquired %s on tuple(x,y) of relation %u of
database %u after %ld.%03d ms

Attached patch passes through further information about the lock wait,
so we can display the following message instead
   LOG: process %d acquired %s on transaction %u on relation %u of
database %u after %ld.%03d ms

This should help identify benefit/loss from the FKlocks feature, since
MultiXactIdWait() shows up in the log as contention of this type.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


identify_xact_waits.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Claudio Freire
On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian  wrote:
>> Given our row-based storage architecture, I can't imagine we'd do
>> anything other than take a row-based approach to this..  I would think
>> we'd do two things: parallelize based on partitioning, and parallelize
>> seqscan's across the individual heap files which are split on a per-1G
>> boundary already.  Perhaps we can generalize that and scale it based on
>> the number of available processors and the size of the relation but I
>> could see advantages in matching up with what the kernel thinks are
>> independent files.
>
> The 1GB idea is interesting.  I found in pg_upgrade that file copy would
> just overwhelm the I/O channel, and that doing multiple copies on the
> same device had no win, but those were pure I/O operations --- a
> sequential scan might be enough of a mix of I/O and CPU that parallelism
> might help.

AFAIR, synchroscans were introduced because multiple large sequential
scans were counterproductive (big time).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
On Wed, Jan 16, 2013 at 09:11:20AM +0900, Michael Paquier wrote:
> 
> 
> On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian  wrote:
> 
> I mentioned last year that I wanted to start working on parallelism:
> 
> https://wiki.postgresql.org/wiki/Parallel_Query_Execution
> 
> Years ago I added thread-safety to libpq.  Recently I added two parallel
> execution paths to pg_upgrade.  The first parallel path allows execution
> of external binaries pg_dump and psql (to restore).  The second parallel
> path does copy/link by calling fork/thread-safe C functions.  I was able
> to do each in 2-3 days.
> 
> I believe it is time to start adding parallel execution to the backend.
> We already have some parallelism in the backend:
> effective_io_concurrency and helper processes.  I think it is time we
> start to consider additional options.
> 
> Parallelism isn't going to help all queries, in fact it might be just a
> small subset, but it will be the larger queries.  The pg_upgrade
> parallelism only helps clusters with multiple databases or tablespaces,
> but the improvements are significant.
> 
> I have summarized my ideas by updating our Parallel Query Execution wiki
> page:
> 
> https://wiki.postgresql.org/wiki/Parallel_Query_Execution
> 
> Please consider updating the page yourself or posting your ideas to this
> thread.  Thanks.
> 
> Honestly that would be a great feature, and I would be happy helping working 
> on
> it.
> Taking advantage of parallelism in a server with multiple core, especially for
> things like large sorting operations would be great.
> Just thinking loudly, but wouldn't it be the role of the planner to determine
> if such or such query is worth using parallelism? The executor would then be 
> in
> charge of actually firing the tasks in parallel that planner has determined
> necessary to do.

Yes, it would probably be driven off of the optimizer statistics.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Curious buildfarm failures (fwd)

2013-01-15 Thread Tom Lane
Andres Freund  writes:
> -O0 passes

Grumble... suspect we're chasing another compiler bug now, but ...

You might try -O1; if that shows the bug it'll probably be a tad easier
to debug in.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Curious buildfarm failures (fwd)

2013-01-15 Thread Andres Freund
On 2013-01-16 02:34:52 +0100, Andres Freund wrote:
> On 2013-01-16 02:13:26 +0100, Andres Freund wrote:
> > On 2013-01-15 19:56:52 -0500, Tom Lane wrote:
> > > Andres Freund  writes:
> > > > FWIW its also triggerable if two other function calls are places inside
> > > > the above if() (I tried fprintf(stderr, "argh") and kill(0, 0)).
> > > 
> > > [ confused... ]  You mean replacing the abort() in the elog macro with
> > > one of these functions?  Or something else?
> > 
> > I mean replacing the elog(ERROR, "ForwardFsyncRequest must...") with any
> > two function calls inside a do/while(0). I just tried to place some
> > random functions there instead of the elog to make sure its unrelated,
> > and it still triggers the problem even before the elog commit. The
> > assembler output of that function changes wildly with tiny changes and I
> > don't understand IA-64 at all (does anybody?), so I don't see anything
> > we can do there.
> > 
> > > > It seems the change just made an existing issue visible.
> > > > No idea what to do about it.
> > > 
> > > Pretty clearly a compiler bug at this point.  Since there doesn't seem
> > > to be a clean workaround (no, I don't want to expand the struct
> > > assignment manually), and anyway we can't be sure that the bug doesn't
> > > also manifest in other places, recommending Sergey update his compiler
> > > seems like the thing to do.
> > 
> > Yea. Don't have a better suggestion.
> > 
> > > At this point I'm more interested in his report in
> > >  about
> > > the Assert at spgdoinsert.c:1222 failing.  That's pretty new code, so
> > > more likely to have a genuine bug, and I wonder if it's related to
> > > the spgist issue in <50ebf992.2000...@qunar.com> ...
> > 
> > Yes, it looks more like it could be something real. There are
> > suspicously many other failing tests though (misc, with) that don't seem
> > to be related to the spgist crash.
> 
> #4  0x401a6320 in doPickSplit (index=0x6007ff48, state=0x3, 
> current=0x6ff7a700, parent=0x4, newLeafTuple=0x6, 
> level=512360, isNulls=64 '@', isNew=12 '\f') at spgdoinsert.c:1222
> (gdb) p parent
> $4 = {blkno = 1, buffer = 356, page = 0x2148eea0 "", offnum = 1, node 
> = 4}
> 
> (gdb) p &parent
> $7 = (SPPageDesc *) 0x6ff7a900

-O0 passes

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Curious buildfarm failures (fwd)

2013-01-15 Thread Andres Freund
On 2013-01-15 20:32:00 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2013-01-15 19:56:52 -0500, Tom Lane wrote:
> >> At this point I'm more interested in his report in
> >>  about
> >> the Assert at spgdoinsert.c:1222 failing.  That's pretty new code, so
> >> more likely to have a genuine bug, and I wonder if it's related to
> >> the spgist issue in <50ebf992.2000...@qunar.com> ...
> 
> > Yes, it looks more like it could be something real. There are
> > suspicously many other failing tests though (misc, with) that don't seem
> > to be related to the spgist crash.
> 
> Looking again, the pg_regress output appears to indicate two separate
> crashes (one during rangetypes, the other during create_index).  The
> reported Assert trap was in the rangetypes test, but the other one
> could very easily have been from spgist code as well.  I'd tend to
> write off all the other reported diffs as followon damage from the
> crashes, at least without clearer evidence that they weren't.  There are
> very many instances in our regression tests where failure to complete
> one test results in bogus diffs in later ones, because DB objects don't
> exist or don't have the expected contents.

I just checked and its just followup damage.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] string escaping in tutorial/syscat.source

2013-01-15 Thread Jeff Janes
On Sun, Oct 14, 2012 at 8:53 PM, Josh Kupershmidt
>
wrote:
> Hi all,
> It seems the queries in ./src/tutorial/syscat.source use string
> escaping with the assumption that standard_conforming_strings is off,
> and thus give wrong results with modern versions. A simple fix is
> attached.

Hi Josh,

Do you propose back-patching this?  You could argue that this is a bug in
9.1 and 9.2.   Before that, they generate deprecation warnings, but do not
give the wrong answer.

If it is only to be applied to HEAD, or only to 9.1, 9.2, and HEAD, then
this part seems to be unnecessary and I think should be removed (setting a
value to its default is more likely to cause confusion than remove
confusion):

SET standard_conforming_strings TO on;

and the corresponding reset as well.

Other than that, it does what it says (fix a broken example), does it
cleanly, we want this, and I have no other concerns.

I guess the src/tutorial directory could participate in regression tests,
in which case this problem would have been detected when introduced, but I
don't think I can demand that you invent regression tests for a feature you
are just fixing rather than creating.

Thanks for the patch,

Jeff


Re: [HACKERS] Curious buildfarm failures (fwd)

2013-01-15 Thread Andres Freund
On 2013-01-16 02:13:26 +0100, Andres Freund wrote:
> On 2013-01-15 19:56:52 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > FWIW its also triggerable if two other function calls are places inside
> > > the above if() (I tried fprintf(stderr, "argh") and kill(0, 0)).
> > 
> > [ confused... ]  You mean replacing the abort() in the elog macro with
> > one of these functions?  Or something else?
> 
> I mean replacing the elog(ERROR, "ForwardFsyncRequest must...") with any
> two function calls inside a do/while(0). I just tried to place some
> random functions there instead of the elog to make sure its unrelated,
> and it still triggers the problem even before the elog commit. The
> assembler output of that function changes wildly with tiny changes and I
> don't understand IA-64 at all (does anybody?), so I don't see anything
> we can do there.
> 
> > > It seems the change just made an existing issue visible.
> > > No idea what to do about it.
> > 
> > Pretty clearly a compiler bug at this point.  Since there doesn't seem
> > to be a clean workaround (no, I don't want to expand the struct
> > assignment manually), and anyway we can't be sure that the bug doesn't
> > also manifest in other places, recommending Sergey update his compiler
> > seems like the thing to do.
> 
> Yea. Don't have a better suggestion.
> 
> > At this point I'm more interested in his report in
> >  about
> > the Assert at spgdoinsert.c:1222 failing.  That's pretty new code, so
> > more likely to have a genuine bug, and I wonder if it's related to
> > the spgist issue in <50ebf992.2000...@qunar.com> ...
> 
> Yes, it looks more like it could be something real. There are
> suspicously many other failing tests though (misc, with) that don't seem
> to be related to the spgist crash.

#3  0x40b5c710 in ExceptionalCondition (
conditionName=0x40c76d50 "!(( ((void) ((bool) ((! assert_enabled) 
|| ! (!(((bool) (((const void*)(&nodes[n]->t_tid) != ((void *)0)) && 
((&nodes[n]->t_tid)->ip_posid != 0) || (ExceptionalCondition(\"!(((bool) 
(((const void*)"..., 
errorType=0x40c4c5a0 "FailedAssertion", fileName=0x40c75d30 
"spgdoinsert.c", lineNumber=1222) at assert.c:54
#4  0x401a6320 in doPickSplit (index=0x6007ff48, state=0x3, 
current=0x6ff7a700, parent=0x4, newLeafTuple=0x6, 
level=512360, isNulls=64 '@', isNew=12 '\f') at spgdoinsert.c:1222
#5  0x401a12d0 in spgdoinsert (index=0x29856028, 
state=0x6ff7a9d0, heapPtr=0x601e6e7c, 
datum=6917546619826579712, isnull=0 '\0') at spgdoinsert.c:1996
#6  0x40195870 in spginsert (fcinfo=0x6ff7a9d0) at 
spginsert.c:222
#7  0x40b77dd0 in FunctionCall6Coll (flinfo=0x60102018, 
collation=0, arg1=2305843009373429800, arg2=6917546619826580944, 
arg3=6917546619826581200, arg4=6917529027643076220, 
arg5=2305843009373166576, arg6=0) at fmgr.c:1439
#8  0x40148b70 in index_insert (indexRelation=0x29856028, 
values=0x6ff7add0, isnull=0x6ff7aed0 "", 
heap_t_ctid=0x601e6e7c, heapRelation=0x29815bf0, 
checkUnique=UNIQUE_CHECK_NO) at indexam.c:216
#9  0x404e99f0 in ExecInsertIndexTuples (slot=0x601e55c0, 
tupleid=0x601e6e7c, estate=0x601e4f18)
at execUtils.c:1088
#10 0x40516710 in ExecModifyTable (node=0x0) at nodeModifyTable.c:249
#11 0x404c6350 in $$1$3_0$TAG$0ca$0$3 () at execProcnode.c:377
#12 0x404bba00 in ExecutorRun (queryDesc=0x601e4fb0, 
direction=NoMovementScanDirection, count=0) at execMain.c:1400
#13 0x408493f0 in PortalRunMulti (portal=0x600ff7f8, 
isTopLevel=-26 '�', dest=0x601ef658, 
altdest=0x601ef658, completionTag=0x6ff7b2d0 "") at 
pquery.c:185
#14 0x40848d20 in _setjmp_lpad_PortalRun_1$0$13 () at pquery.c:814
#15 0x40840c60 in exec_simple_query (
query_string=0x6018d4f8 "insert into test_range_spgist select 
'empty'::int4range from generate_series(1,500) g;")
at postgres.c:1048
#16 0x408370a0 in _setjmp_lpad_PostgresMain_0$0$51 () at postgres.c:3969
---Type  to continue, or q  to quit---
#17 0x40720240 in BackendStartup (port=0x600fc950) at 
postmaster.c:3989
#18 0x4071dc80 in ServerLoop () at postmaster.c:1575
#19 0x4071a700 in PostmasterMain (argc=9, argv=0x600dc300) at 
postmaster.c:1244
#20 0x405796d0 in main (argc=9, argv=0x600dc010) at main.c:197


#4  0x401a6320 in doPickSplit (index=0x6007ff48, state=0x3, 
current=0x6ff7a700, parent=0x4, newLeafTuple=0x6, 
level=512360, isNulls=64 '@', isNew=12 '\f') at spgdoinsert.c:1222
1222
Assert(ItemPointerGetBlockNumber(&nodes[n]->t_tid) == leafBlock);

(gdb) info locals
in = {nTuples = 227, datums = 0x60205060, level = 1}
out = {hasPrefix = 0 '\0', prefixDatum = 0, nNodes = 8, nodeLabels = 0

Re: [HACKERS] Curious buildfarm failures (fwd)

2013-01-15 Thread Tom Lane
Andres Freund  writes:
> On 2013-01-15 19:56:52 -0500, Tom Lane wrote:
>> At this point I'm more interested in his report in
>>  about
>> the Assert at spgdoinsert.c:1222 failing.  That's pretty new code, so
>> more likely to have a genuine bug, and I wonder if it's related to
>> the spgist issue in <50ebf992.2000...@qunar.com> ...

> Yes, it looks more like it could be something real. There are
> suspicously many other failing tests though (misc, with) that don't seem
> to be related to the spgist crash.

Looking again, the pg_regress output appears to indicate two separate
crashes (one during rangetypes, the other during create_index).  The
reported Assert trap was in the rangetypes test, but the other one
could very easily have been from spgist code as well.  I'd tend to
write off all the other reported diffs as followon damage from the
crashes, at least without clearer evidence that they weren't.  There are
very many instances in our regression tests where failure to complete
one test results in bogus diffs in later ones, because DB objects don't
exist or don't have the expected contents.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Curious buildfarm failures (fwd)

2013-01-15 Thread Andres Freund
On 2013-01-15 19:56:52 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > FWIW its also triggerable if two other function calls are places inside
> > the above if() (I tried fprintf(stderr, "argh") and kill(0, 0)).
> 
> [ confused... ]  You mean replacing the abort() in the elog macro with
> one of these functions?  Or something else?

I mean replacing the elog(ERROR, "ForwardFsyncRequest must...") with any
two function calls inside a do/while(0). I just tried to place some
random functions there instead of the elog to make sure its unrelated,
and it still triggers the problem even before the elog commit. The
assembler output of that function changes wildly with tiny changes and I
don't understand IA-64 at all (does anybody?), so I don't see anything
we can do there.

> > It seems the change just made an existing issue visible.
> > No idea what to do about it.
> 
> Pretty clearly a compiler bug at this point.  Since there doesn't seem
> to be a clean workaround (no, I don't want to expand the struct
> assignment manually), and anyway we can't be sure that the bug doesn't
> also manifest in other places, recommending Sergey update his compiler
> seems like the thing to do.

Yea. Don't have a better suggestion.

> At this point I'm more interested in his report in
>  about
> the Assert at spgdoinsert.c:1222 failing.  That's pretty new code, so
> more likely to have a genuine bug, and I wonder if it's related to
> the spgist issue in <50ebf992.2000...@qunar.com> ...

Yes, it looks more like it could be something real. There are
suspicously many other failing tests though (misc, with) that don't seem
to be related to the spgist crash.


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Curious buildfarm failures (fwd)

2013-01-15 Thread Tom Lane
Andres Freund  writes:
> FWIW its also triggerable if two other function calls are places inside
> the above if() (I tried fprintf(stderr, "argh") and kill(0, 0)).

[ confused... ]  You mean replacing the abort() in the elog macro with
one of these functions?  Or something else?

> It seems the change just made an existing issue visible.
> No idea what to do about it.

Pretty clearly a compiler bug at this point.  Since there doesn't seem
to be a clean workaround (no, I don't want to expand the struct
assignment manually), and anyway we can't be sure that the bug doesn't
also manifest in other places, recommending Sergey update his compiler
seems like the thing to do.

At this point I'm more interested in his report in
 about
the Assert at spgdoinsert.c:1222 failing.  That's pretty new code, so
more likely to have a genuine bug, and I wonder if it's related to
the spgist issue in <50ebf992.2000...@qunar.com> ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-01-15 Thread Greg Smith
First rev of a simple corruption program is attached, in very C-ish 
Python.  The parameters I settled on are to accept a relation name, byte 
offset, byte value, and what sort of operation to do:  overwrite, AND, 
OR, XOR.  I like XOR here because you can fix it just by running the 
program again.  Rewriting this in C would not be terribly difficult, and 
most of the time spent on this version was figuring out what to do.


This follows Jeff's idea that the most subtle corruption is the hardest 
to spot, so testing should aim at the smallest unit of change.  If you 
can spot a one bit error in an unused byte of a page, presumably that 
will catch large errors like a byte swap.  I find some grim amusement 
that the checksum performance testing I've been trying to do got stuck 
behind a problem with a tiny, hard to detect single bit of corruption.


Here's pgbench_accounts being corrupted, the next to last byte on this line:

$ pgbench -i -s 1
$ ./pg_corrupt pgbench_accounts show
Reading byte 0 within file /usr/local/var/postgres/base/16384/25242
Current byte= 0 / $00
$ hexdump /usr/local/var/postgres/base/16384/25242 | head
000 00 00 00 00 00 00 00 00 00 00 04 00 0c 01 80 01
...
$ ./pg_corrupt pgbench_accounts 14 1
/usr/local/var/postgres base/16384/25242 8192 13434880 1640
Reading byte 14 within file /usr/local/var/postgres/base/16384/25242
Current byte= 128 / $80
Modified byte= 129 / $81
File modified successfully
$ hexdump /usr/local/var/postgres/base/16384/25242 | head
000 00 00 00 00 00 00 00 00 00 00 04 00 0c 01 81 01

That doesn't impact selecting all of the rows:

$ psql -c "select count(*) from pgbench_accounts"
 count

 10

And pg_dump works fine against the table too.  Tweaking this byte looks 
like a reasonable first test case for seeing if checksums can catch an 
error that query execution doesn't.


Next I'm going to test the functional part of the latest checksum patch; 
duplicate Jeff's targeted performance tests; and then run some of my 
own.  I wanted to get this little tool circulating now that it's useful 
first.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
#!/usr/bin/env python
#
# pg_corrupt
#
# Read in a byte of a PostgreSQL relation (a table or index) and allow writing
# it back with an altered value.
#
# Greg Smith 
# Copyright (c) 2013, Heroku, Inc.
# Released under The PostgreSQL Licence
#

from os.path import join
from subprocess import Popen,PIPE
import sys
import psycopg2

class Operations:
SHOW=0
WRITE=1
AND=2
OR=3
XOR=4
text=['show','write','and','or','xor']

def controldata_blocks_per_segment(pgdata):
blocks_per_seg = 131072
try:
# TODO This doesn't work when called in an Emacs compile shell
out, err = Popen("pg_controldata %s" % pgdata, stdout=PIPE, 
shell=True).communicate()
control_data=out.splitlines()
for c in control_data:
if c.startswith("Blocks per segment of large relation:"):
blocks_per_seg=int(c.split(":")[1])
except:
print "Cannot determine blocks per segment, using default 
of",blocks_per_seg
return blocks_per_seg

def get_table_info(conn,relation):
cur = conn.cursor()
q="SELECT \
   current_setting('data_directory') AS data_directory, \
   pg_relation_filepath(oid), \
   current_setting('block_size') AS block_size, \
   pg_relation_size(oid), \
   relpages \
   FROM pg_class \
   WHERE relname='%s'" % relation
cur.execute(q)
if cur.rowcount != 1:
print "Error: did not return 1 row from pg_class lookup of %s" % 
relation
return None
table_info={}

for i in cur:
table_info['relation']=relation
table_info['pgdata'] = i[0]
table_info['filepath'] = i[1]
table_info['block_size'] = int(i[2])
table_info['relation_size'] = i[3]
table_info['relpages'] = i[4]
table_info['base_file_name']=join(i[0],i[1])
table_info['blocks_per_seg'] = 
controldata_blocks_per_segment(table_info['pgdata'])
table_info['bytes_per_seg'] = table_info['block_size'] * 
table_info['blocks_per_seg']

cur.close()
return table_info

def operate(table_info,byte_offset,operation,value):
if byte_offset > table_info['relation_size']:
print "Error:  trying to change byte %s but relation %s is only %s 
bytes" % \
(byte_offset, table_info['relation'],table_info['relation_size'])
return

if byte_offset < 0:
print "Error:  cannot use negative byte offsets"
return

file_name=table_info['base_file_name']
file_seq=int(byte_offset / table_info['bytes_per_seg'])
if file_seq > 0:
file_name=file_name + ".%s" % file_seq
file_offset = byte_offset - file_seq * table_info['bytes_per_seg']
else:
file_offset = byte_offset

print "Reading byte",file_offset,"wi

Re: [HACKERS] Parallel query execution

2013-01-15 Thread Michael Paquier
On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian  wrote:

> I mentioned last year that I wanted to start working on parallelism:
>
> https://wiki.postgresql.org/wiki/Parallel_Query_Execution
>
> Years ago I added thread-safety to libpq.  Recently I added two parallel
> execution paths to pg_upgrade.  The first parallel path allows execution
> of external binaries pg_dump and psql (to restore).  The second parallel
> path does copy/link by calling fork/thread-safe C functions.  I was able
> to do each in 2-3 days.
>
> I believe it is time to start adding parallel execution to the backend.
> We already have some parallelism in the backend:
> effective_io_concurrency and helper processes.  I think it is time we
> start to consider additional options.
>
> Parallelism isn't going to help all queries, in fact it might be just a
> small subset, but it will be the larger queries.  The pg_upgrade
> parallelism only helps clusters with multiple databases or tablespaces,
> but the improvements are significant.
>
> I have summarized my ideas by updating our Parallel Query Execution wiki
> page:
>
> https://wiki.postgresql.org/wiki/Parallel_Query_Execution
>
> Please consider updating the page yourself or posting your ideas to this
> thread.  Thanks.
>
Honestly that would be a great feature, and I would be happy helping
working on it.
Taking advantage of parallelism in a server with multiple core, especially
for things like large sorting operations would be great.
Just thinking loudly, but wouldn't it be the role of the planner to
determine if such or such query is worth using parallelism? The executor
would then be in charge of actually firing the tasks in parallel that
planner has determined necessary to do.
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] Curious buildfarm failures (fwd)

2013-01-15 Thread Andres Freund
On 2013-01-16 00:26:01 +0100, Andres Freund wrote:
> On 2013-01-15 17:56:40 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > I played a bit arround (thanks Sergey!) and it seems to be some rather
> > > strange optimization issue around the fsync request queue.
> > 
> > > Namely changing 
> > >   request->rnode = rnode;
> > > into
> > >   request->rnode.spcNode = rnode.spcNode;
> > >   request->rnode.dbNode = rnode.dbNode;
> > >   request->rnode.relNode = rnode.relNode;
> > > makes it pass reliably.
> > 
> > Jeez.  That's my candidate for weird compiler bug of the month.
> > 
> > > How the hell thats correlating with the elog changes I don't yet know.
> > 
> > There is an elog(ERROR) further up in the same function, but it's sure
> > not clear how that could cause the compiler to misimplement a struct
> > assignment.
> 
> Indeed, replacing the elog() there with a plain abort() or the old-style
> elog definition makes it work. Just using a do-while with the old
> definition inside makes it fail.
> 
> My IA64 knowledge is pretty basic, but I would guess this is stack or
> code alignment related I seem to remember quite some strange
> requirements there.

FWIW its also triggerable if two other function calls are places inside
the above if() (I tried fprintf(stderr, "argh") and kill(0, 0)).
It seems the change just made an existing issue visible.

No idea what to do about it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch: log level change in gistsplit.c

2013-01-15 Thread Josh Hansen
Hello all,

We've been seeing these same messages flying by our logs for years:

2012-12-13 22:28:07 GMT LOG:  picksplit method for column 2 of index 
"xxx_idx" doesn't support secondary split

I see from the linked message that Tom Lane made the recommendation of changing 
the loglevel from LOG to DEBUG1. As such, I've gone ahead and made this change 
and attached the patch file. 
http://archives.postgresql.org/pgsql-general/2007-08/msg01810.php

The patch itself was created against the latest master.

Thanks!

Josh Hansen



Confidentiality Notice: This e-mail may contain proprietary information some of 
which may be legally privileged. It is for the intended recipient(s) only. If 
you believe that it has been sent to you in error, please notify the sender by 
reply e-mail and delete the message. Any disclosure, copying, distribution or 
use of this information by someone other than the intended recipient(s) is 
prohibited and may be unlawful.


picksplit_loglevel.patch
Description: picksplit_loglevel.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pkg-config files for libpq and ecpg

2013-01-15 Thread Tom Lane
Peter Eisentraut  writes:
> I'll take another stab at providing pkg-config files for the client-side
> libraries.

This bit:

> + echo 'Libs.private: $(filter-out 
> $(PKG_CONFIG_REQUIRES_PRIVATE:lib%=-l%),$(filter-out -L..%, $(SHLIB_LINK)))' 
> >>$@

appears to assume that SHLIB_LINK contains nothing except -L and -l
switches.  I don't think I trust that a whole lot --- in fact, it
looks guaranteed to fail on HPUX because of -print-libgcc-file-name.
There might be other platform-specific bogosity on other platforms;
PTHREAD_LIBS seems like a likely source for instance.

Another point is that -L switches used at build time might not have much
to do with the environment a .pc file would be used in later.  I see
that you're dropping relative -L paths, which is good as far as it
goes, but I wonder whether it goes far enough.

Perhaps the first step should be to filter in -l%, rather than
filter out -L% ?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ctl idempotent option

2013-01-15 Thread Michael Paquier
On Wed, Jan 16, 2013 at 4:29 AM, Bruce Momjian  wrote:

> That's Japanese for idempotent.  ;-)  LOL
>
+1.
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Andrew Dunstan


On 01/15/2013 06:22 PM, Tom Lane wrote:

Stephen Frost  writes:

Perhaps I'm taking a bit too narrow view of the world, but my thinking
is OLTP won't want things compressed, as it increases latency of
requests, while OLAP users are operating with enough data that they'll
go through the effort to use COPY.

Also, if there are so many people using COPY in their apps, why have we
never seen any submitted patches to extend libpq with functions to
construct/deconstruct COPY data?  Surely somebody would have said to
themselves that they shouldn't be responsible for knowing those escaping
rules.


There are perfectly good libraries in Perl and other languages for 
constructing/deconstructing CSV data. If we didn't have CSV 
import/export I suspect we would have heard lots more howls by now.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Curious buildfarm failures (fwd)

2013-01-15 Thread Andres Freund
On 2013-01-15 17:56:40 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > I played a bit arround (thanks Sergey!) and it seems to be some rather
> > strange optimization issue around the fsync request queue.
> 
> > Namely changing 
> > request->rnode = rnode;
> > into
> > request->rnode.spcNode = rnode.spcNode;
> > request->rnode.dbNode = rnode.dbNode;
> > request->rnode.relNode = rnode.relNode;
> > makes it pass reliably.
> 
> Jeez.  That's my candidate for weird compiler bug of the month.
> 
> > How the hell thats correlating with the elog changes I don't yet know.
> 
> There is an elog(ERROR) further up in the same function, but it's sure
> not clear how that could cause the compiler to misimplement a struct
> assignment.

Indeed, replacing the elog() there with a plain abort() or the old-style
elog definition makes it work. Just using a do-while with the old
definition inside makes it fail.

My IA64 knowledge is pretty basic, but I would guess this is stack or
code alignment related I seem to remember quite some strange
requirements there.

>  Maybe the problem is not in those lines alone, but the fact
> that rnode is a pass-by-value struct?  (That is, maybe it's the value of
> the rnode local variable that's getting munged, somewhere up near the
> elog call?)

No, I found this because I printed the values before enquing the values
into shmem and after dequeing. After noticing that they didn't match I
added more...

> We tend to not use pass-by-value struct params much, so we
> might not have noticed a compiler bug associated with that.  Or IOW,
> does changing ForwardFsyncRequest to use a "const RelFileNode *rnode"
> parameter make it go away?

Nope, same thing.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Tom Lane
Stephen Frost  writes:
> Perhaps I'm taking a bit too narrow view of the world, but my thinking
> is OLTP won't want things compressed, as it increases latency of
> requests, while OLAP users are operating with enough data that they'll
> go through the effort to use COPY.

Also, if there are so many people using COPY in their apps, why have we
never seen any submitted patches to extend libpq with functions to
construct/deconstruct COPY data?  Surely somebody would have said to
themselves that they shouldn't be responsible for knowing those escaping
rules.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
On Tue, Jan 15, 2013 at 06:15:57PM -0500, Stephen Frost wrote:
> * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote:
> > How about being aware of multiple spindles - so if the requested
> > data covers multiple spindles, then data could be extracted in
> > parallel. This may, or may not, involve multiple I/O channels?
> 
> Yes, this should dovetail with partitioning and tablespaces to pick up
> on exactly that.  We're implementing our own poor-man's parallelism
> using exactly this to use as much of the CPU and I/O bandwidth as we
> can.  I have every confidence that it could be done better and be
> simpler for us if it was handled in the backend.

Yes, I have listed tablespaces and partitions as possible parallel
options on the wiki.

> > On large multiple processor machines, there are different blocks of
> > memory that might be accessed at different speeds depending on the
> > processor. Possibly a mechanism could be used to split a transaction
> > over multiple processors to ensure the fastest memory is used?
> 
> Let's work on getting it working on the h/w that PG is most commonly
> deployed on first..  I agree that we don't want to paint ourselves into
> a corner with this, but I don't think massive NUMA systems are what we
> should focus on first (are you familiar with any that run PG today..?).
> I don't expect we're going to be trying to fight with the Linux (or
> whatever) kernel over what threads run on what processors with access to
> what memory on small-NUMA systems (x86-based).

Agreed.

> > Once a selection of rows has been made, then if there is a lot of
> > reformatting going on, then could this be done in parallel?  I can
> > of think of 2 very simplistic strategies: (A) use a different
> > processor core for each column, or (B) farm out sets of rows to
> > different cores.  I am sure in reality, there are more subtleties
> > and aspects of both the strategies will be used in a hybrid fashion
> > along with other approaches.
> 
> Given our row-based storage architecture, I can't imagine we'd do
> anything other than take a row-based approach to this..  I would think
> we'd do two things: parallelize based on partitioning, and parallelize
> seqscan's across the individual heap files which are split on a per-1G
> boundary already.  Perhaps we can generalize that and scale it based on
> the number of available processors and the size of the relation but I
> could see advantages in matching up with what the kernel thinks are
> independent files.

The 1GB idea is interesting.  I found in pg_upgrade that file copy would
just overwhelm the I/O channel, and that doing multiple copies on the
same device had no win, but those were pure I/O operations --- a
sequential scan might be enough of a mix of I/O and CPU that parallelism
might help.

> > I expect that before any parallel algorithm is invoked, then some
> > sort of threshold needs to be exceeded to make it worth while.
> 
> Certainly.  That's need to be included in the optimization model to
> support this.

I have updated the wiki to reflect the ideas mentioned above.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Really?  Given that libpq provides no useful support for doing anything
>> with COPY data, much less higher-level packages such as Perl DBI, I'd
>> venture that the real-world ratio is more like 90/10.  If not 99/1.

> Perhaps I'm taking a bit too narrow view of the world, but my thinking
> is OLTP won't want things compressed, as it increases latency of
> requests, while OLAP users are operating with enough data that they'll
> go through the effort to use COPY.

I should think the argument for or against wire-protocol compression
depends mainly on your network environment, not the nature of your
application.  Either bytes sent are more expensive than CPU cycles at
each end, or vice versa.  Latency could be a big deal if we weren't
going to force compressor flushes at synchronization boundaries, but
if we are, any added latency is a matter of a few cycles at most.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Stephen Frost
* Gavin Flower (gavinflo...@archidevsys.co.nz) wrote:
> How about being aware of multiple spindles - so if the requested
> data covers multiple spindles, then data could be extracted in
> parallel. This may, or may not, involve multiple I/O channels?

Yes, this should dovetail with partitioning and tablespaces to pick up
on exactly that.  We're implementing our own poor-man's parallelism
using exactly this to use as much of the CPU and I/O bandwidth as we
can.  I have every confidence that it could be done better and be
simpler for us if it was handled in the backend.

> On large multiple processor machines, there are different blocks of
> memory that might be accessed at different speeds depending on the
> processor. Possibly a mechanism could be used to split a transaction
> over multiple processors to ensure the fastest memory is used?

Let's work on getting it working on the h/w that PG is most commonly
deployed on first..  I agree that we don't want to paint ourselves into
a corner with this, but I don't think massive NUMA systems are what we
should focus on first (are you familiar with any that run PG today..?).
I don't expect we're going to be trying to fight with the Linux (or
whatever) kernel over what threads run on what processors with access to
what memory on small-NUMA systems (x86-based).

> Once a selection of rows has been made, then if there is a lot of
> reformatting going on, then could this be done in parallel?  I can
> of think of 2 very simplistic strategies: (A) use a different
> processor core for each column, or (B) farm out sets of rows to
> different cores.  I am sure in reality, there are more subtleties
> and aspects of both the strategies will be used in a hybrid fashion
> along with other approaches.

Given our row-based storage architecture, I can't imagine we'd do
anything other than take a row-based approach to this..  I would think
we'd do two things: parallelize based on partitioning, and parallelize
seqscan's across the individual heap files which are split on a per-1G
boundary already.  Perhaps we can generalize that and scale it based on
the number of available processors and the size of the relation but I
could see advantages in matching up with what the kernel thinks are
independent files.

> I expect that before any parallel algorithm is invoked, then some
> sort of threshold needs to be exceeded to make it worth while.

Certainly.  That's need to be included in the optimization model to
support this.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
On Wed, Jan 16, 2013 at 12:03:50PM +1300, Gavin Flower wrote:
> On 16/01/13 11:14, Bruce Momjian wrote:
> 
> I mentioned last year that I wanted to start working on parallelism:
> 
> https://wiki.postgresql.org/wiki/Parallel_Query_Execution
> 
> Years ago I added thread-safety to libpq.  Recently I added two parallel
> execution paths to pg_upgrade.  The first parallel path allows execution
> of external binaries pg_dump and psql (to restore).  The second parallel
> path does copy/link by calling fork/thread-safe C functions.  I was able
> to do each in 2-3 days.
> 
> I believe it is time to start adding parallel execution to the backend.
> We already have some parallelism in the backend:
> effective_io_concurrency and helper processes.  I think it is time we
> start to consider additional options.
> 
> Parallelism isn't going to help all queries, in fact it might be just a
> small subset, but it will be the larger queries.  The pg_upgrade
> parallelism only helps clusters with multiple databases or tablespaces,
> but the improvements are significant.
> 
> I have summarized my ideas by updating our Parallel Query Execution wiki
> page:
> 
> https://wiki.postgresql.org/wiki/Parallel_Query_Execution
> 
> Please consider updating the page yourself or posting your ideas to this
> thread.  Thanks.
> 
> 
> Hmm...
> 
> How about being aware of multiple spindles - so if the requested data covers
> multiple spindles, then data could be extracted in parallel.  This may, or may
> not, involve multiple I/O channels?

Well, we usually label these as tablespaces.  I don't know if
spindle-level is a reasonable level to add.

> On large multiple processor machines, there are different blocks of memory 
> that
> might be accessed at different speeds depending on the processor.  Possibly a
> mechanism could be used to split a transaction over multiple processors to
> ensure the fastest memory is used?

That seems too far-out for an initial approach.

> Once a selection of rows has been made, then if there is a lot of reformatting
> going on, then could this be done in parallel?  I can of think of 2 very
> simplistic strategies: (A) use a different processor core for each column, or
> (B) farm out sets of rows to different cores.  I am sure in reality, there are
> more subtleties and aspects of both the strategies will be used in a hybrid
> fashion along with other approaches.

Probably #2, but that is going to require having some of modules
thread/fork-safe, and that is going to be tricky.

> I expect that before any parallel algorithm is invoked, then some sort of
> threshold needs to be exceeded to make it worth while.  Different aspects of
> the parallel algorithm may have their own thresholds.  It may not be worth
> applying a parallel algorithm for 10 rows from a simple table, but selecting
> 10,000 records from multiple tables each over 10 million rows using joins may
> benefit for more extreme parallelism.

Right, I bet we will need some way to control when the overhead of
parallel execution is worth it.

> I expect that UNIONs, as well as the processing of partitioned tables, may be
> amenable to parallel processing.

Interesting idea on UNION.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Really?  Given that libpq provides no useful support for doing anything
> with COPY data, much less higher-level packages such as Perl DBI, I'd
> venture that the real-world ratio is more like 90/10.  If not 99/1.

Perhaps I'm taking a bit too narrow view of the world, but my thinking
is OLTP won't want things compressed, as it increases latency of
requests, while OLAP users are operating with enough data that they'll
go through the effort to use COPY.

> There might be a few souls out there who are hardy enough and concerned
> enough with performance to have made their apps speak COPY protocol,
> and not given up on it the first time they hit a quoting/escaping bug
> ... but not many, I bet.

The Perl/PHP/Ruby/etc users that are writing OLTP systems aren't likely
going to be interested in this.  The OLAP users are likely using psql
(it's what we're using to load terrabytes of data via COPY, JDBC, DBI,
etc, all have been tried and pale in comparison..).

> The key thing there is to force a stream buffer flush (too lazy to look
> up exactly what zlib calls it, but they have the concept) exactly when
> we're about to do a flush to the socket.  That way we get cross-packet
> compression but don't have a problem with the compressor failing to send
> the last partial message when we need it to.

Yes, I'm familiar with it.

> (My suggestion of an expansible option is for future-proofing, not
> because I think we'd try to support more than one option today.)

Fair enough.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
On Tue, Jan 15, 2013 at 11:01:04PM +, Simon Riggs wrote:
> On 15 January 2013 22:55, Bruce Momjian  wrote:
> 
> >> Why is this being discussed now?
> >
> > It is for 9.4 and will take months.  I didn't think there was a better
> > time.  We don't usually discuss features during beta testing.
> 
> Bruce, there are many, many patches on the queue. How will we ever get
> to beta testing if we begin open ended discussions on next release?
> 
> If we can't finish what we've started for 9.3, why talk about 9.4?
> 
> Yes, its a great topic for discussion, but there are better times.

Like when?  I don't remember a policy of not discussing things now. 
Does anyone else remember this?  Are you saying feature discussion is
only between commit-fests?  Is this written down anywhere?  I only
remember beta-time as a time not to discuss features.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Gavin Flower

On 16/01/13 11:14, Bruce Momjian wrote:

I mentioned last year that I wanted to start working on parallelism:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Years ago I added thread-safety to libpq.  Recently I added two parallel
execution paths to pg_upgrade.  The first parallel path allows execution
of external binaries pg_dump and psql (to restore).  The second parallel
path does copy/link by calling fork/thread-safe C functions.  I was able
to do each in 2-3 days.

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes.  I think it is time we
start to consider additional options.

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries.  The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

I have summarized my ideas by updating our Parallel Query Execution wiki
page:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Please consider updating the page yourself or posting your ideas to this
thread.  Thanks.


Hmm...

How about being aware of multiple spindles - so if the requested data 
covers multiple spindles, then data could be extracted in parallel. This 
may, or may not, involve multiple I/O channels?


On large multiple processor machines, there are different blocks of 
memory that might be accessed at different speeds depending on the 
processor. Possibly a mechanism could be used to split a transaction 
over multiple processors to ensure the fastest memory is used?


Once a selection of rows has been made, then if there is a lot of 
reformatting going on, then could this be done in parallel?  I can of 
think of 2 very simplistic strategies: (A) use a different processor 
core for each column, or (B) farm out sets of rows to different cores.  
I am sure in reality, there are more subtleties and aspects of both the 
strategies will be used in a hybrid fashion along with other approaches.


I expect that before any parallel algorithm is invoked, then some sort 
of threshold needs to be exceeded to make it worth while. Different 
aspects of the parallel algorithm may have their own thresholds. It may 
not be worth applying a parallel algorithm for 10 rows from a simple 
table, but selecting 10,000 records from multiple tables each over 10 
million rows using joins may benefit for more extreme parallelism.


I expect that UNIONs, as well as the processing of partitioned tables, 
may be amenable to parallel processing.



Cheers,
Gavin



Re: [HACKERS] Parallel query execution

2013-01-15 Thread Simon Riggs
On 15 January 2013 22:55, Bruce Momjian  wrote:

>> Why is this being discussed now?
>
> It is for 9.4 and will take months.  I didn't think there was a better
> time.  We don't usually discuss features during beta testing.

Bruce, there are many, many patches on the queue. How will we ever get
to beta testing if we begin open ended discussions on next release?

If we can't finish what we've started for 9.3, why talk about 9.4?

Yes, its a great topic for discussion, but there are better times.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Curious buildfarm failures (fwd)

2013-01-15 Thread Tom Lane
Andres Freund  writes:
> I played a bit arround (thanks Sergey!) and it seems to be some rather
> strange optimization issue around the fsync request queue.

> Namely changing 
>   request->rnode = rnode;
> into
>   request->rnode.spcNode = rnode.spcNode;
>   request->rnode.dbNode = rnode.dbNode;
>   request->rnode.relNode = rnode.relNode;
> makes it pass reliably.

Jeez.  That's my candidate for weird compiler bug of the month.

> How the hell thats correlating with the elog changes I don't yet know.

There is an elog(ERROR) further up in the same function, but it's sure
not clear how that could cause the compiler to misimplement a struct
assignment.  Maybe the problem is not in those lines alone, but the fact
that rnode is a pass-by-value struct?  (That is, maybe it's the value of
the rnode local variable that's getting munged, somewhere up near the
elog call?)  We tend to not use pass-by-value struct params much, so we
might not have noticed a compiler bug associated with that.  Or IOW,
does changing ForwardFsyncRequest to use a "const RelFileNode *rnode"
parameter make it go away?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Simon Riggs
On 15 January 2013 22:14, Bruce Momjian  wrote:

> I mentioned last year that I wanted to start working on parallelism:

We don't normally begin discussing topics for next release just as a
CF is starting.

Why is this being discussed now?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
On Tue, Jan 15, 2013 at 10:53:29PM +, Simon Riggs wrote:
> On 15 January 2013 22:14, Bruce Momjian  wrote:
> 
> > I mentioned last year that I wanted to start working on parallelism:
> 
> We don't normally begin discussing topics for next release just as a
> CF is starting.
> 
> Why is this being discussed now?

It is for 9.4 and will take months.  I didn't think there was a better
time.  We don't usually discuss features during beta testing.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I find the argument that this supports compression-over-the-wire to be
>> quite weak, because COPY is only one form of bulk data transfer, and
>> one that a lot of applications don't ever use.  If we think we need to
>> support transmission compression for ourselves, it ought to be
>> integrated at the wire protocol level, not in COPY.

> As far as I can tell, COPY is the option which is strongly recommended
> for bulk-data-operations.  I can see the use-case for wanting SELECT
> results to be compressed, but it strikes me as the 10% case for PG users
> rather than the 90% one.  Ditto for COPY vs. large INSERT .. VALUES.

Really?  Given that libpq provides no useful support for doing anything
with COPY data, much less higher-level packages such as Perl DBI, I'd
venture that the real-world ratio is more like 90/10.  If not 99/1.
There might be a few souls out there who are hardy enough and concerned
enough with performance to have made their apps speak COPY protocol,
and not given up on it the first time they hit a quoting/escaping bug
... but not many, I bet.

> Compressing every small packet seems like it'd be overkill and might
> surprise people by actually reducing performance in the case of lots of
> small requests.

Yeah, proper selection and integration of a compression method would be
critical, which is one reason that I'm not suggesting a plugin for this.
You couldn't expect any-random-compressor to work well.  I think zlib
would be okay though when making use of its stream compression features.
The key thing there is to force a stream buffer flush (too lazy to look
up exactly what zlib calls it, but they have the concept) exactly when
we're about to do a flush to the socket.  That way we get cross-packet
compression but don't have a problem with the compressor failing to send
the last partial message when we need it to.

(My suggestion of an expansible option is for future-proofing, not
because I think we'd try to support more than one option today.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
On Tue, Jan 15, 2013 at 10:39:10PM +, Peter Geoghegan wrote:
> On 15 January 2013 22:14, Bruce Momjian  wrote:
> > I believe it is time to start adding parallel execution to the backend.
> > We already have some parallelism in the backend:
> > effective_io_concurrency and helper processes.  I think it is time we
> > start to consider additional options.
> 
> A few months back, I remarked [1] that speeding up sorting using
> pipelining and asynchronous I/O was probably parallelism low-hanging
> fruit. That hasn't changed, though I personally still don't have the
> bandwidth to look into it in a serious way.
> 
> [1] 
> http://www.postgresql.org/message-id/caeylb_vezpkdx54vex3x30oy_uoth89xoejjw6aucjjiujs...@mail.gmail.com

OK, I added the link to the wiki.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Peter Geoghegan
On 15 January 2013 22:14, Bruce Momjian  wrote:
> I believe it is time to start adding parallel execution to the backend.
> We already have some parallelism in the backend:
> effective_io_concurrency and helper processes.  I think it is time we
> start to consider additional options.

A few months back, I remarked [1] that speeding up sorting using
pipelining and asynchronous I/O was probably parallelism low-hanging
fruit. That hasn't changed, though I personally still don't have the
bandwidth to look into it in a serious way.

[1] 
http://www.postgresql.org/message-id/caeylb_vezpkdx54vex3x30oy_uoth89xoejjw6aucjjiujs...@mail.gmail.com

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
> Parallelism isn't going to help all queries, in fact it might be just a
> small subset, but it will be the larger queries.  The pg_upgrade
> parallelism only helps clusters with multiple databases or tablespaces,
> but the improvements are significant.

This would be fantastic and I'd like to help.  Parallel query and real
partitioning are two of our biggest holes for OLAP and data warehouse
users.

> Please consider updating the page yourself or posting your ideas to this
> thread.  Thanks.

Will do.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Curious buildfarm failures (fwd)

2013-01-15 Thread Andres Freund
On 2013-01-15 14:40:11 -0500, Tom Lane wrote:
> Sergey Koposov  writes:
> > And I do see the tblspc file left after the finish of "make check":
> > tmp_check/data/pg_tblspc/16385/PG_9.3_201212081/16384/16387
> 
> Interesting.  If the tests are run immediately after initdb, 16387
> is the relfilenode assigned to table "foo" in the tablespace regression
> test.  But why would only that table be left behind?  There are half
> a dozen relations in that tablespace at the point of the DROP CASCADE.
> 
> BTW, I just finished trying to reproduce this on an IA64 machine
> belonging to Red Hat, without success.  So that seems to eliminate
> any possibility of the machine architecture being the trigger issue.
> The compiler's still a likely cause though.
> 
> Anybody have a similar ICC version (dugong's says it is 10.0 20070809)
> to try?  Also, Sergey, can you find a non-dot-zero release to try?

I played a bit arround (thanks Sergey!) and it seems to be some rather
strange optimization issue around the fsync request queue.

Namely changing 

/* OK, insert request */
request = 
&CheckpointerShmem->requests[CheckpointerShmem->num_requests++];
request->rnode = rnode;
request->forknum = forknum;
request->segno = segno;
into
/* OK, insert request */
request = 
&CheckpointerShmem->requests[CheckpointerShmem->num_requests++];
request->rnode.spcNode = rnode.spcNode;
request->rnode.dbNode = rnode.dbNode;
request->rnode.relNode = rnode.relNode;
request->forknum = forknum;
request->segno = segno;
makes it pass reliably.

Displaying the values of request after the assignment, but without the
change shows bogus values showing up which explains the problems.

How the hell thats correlating with the elog changes I don't yet know.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
> COPY with a file name is super-user-only.  I am unclear how you would
> use STDIN/STDOUT in any meaningful way with binary data produced by
> compression.  I guess you could with libpq.

The patch that I posted provided this:

psql -h myhost -c "COPY mytable FROM STDIN COMPRESSED;" < myfile.gz

With the compressed file being transferred unmolested to the server side
where it was decompressed and processed by the server.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Bruce Momjian
On Tue, Jan 15, 2013 at 04:22:48PM -0500, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > Well, COPY is super-user only, so it seems only useful for FDW, no?  We
> > already have lots of user-configuration FDW commands, so I can see
> > adding this one too.
> 
> COPY is most certainly not superuser-only..  COPY w/ popen, if that
> popen can call anything, would certainly have to be superuser-only.

COPY with a file name is super-user-only.  I am unclear how you would
use STDIN/STDOUT in any meaningful way with binary data produced by
compression.  I guess you could with libpq.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> I find the argument that this supports compression-over-the-wire to be
> quite weak, because COPY is only one form of bulk data transfer, and
> one that a lot of applications don't ever use.  If we think we need to
> support transmission compression for ourselves, it ought to be
> integrated at the wire protocol level, not in COPY.

As far as I can tell, COPY is the option which is strongly recommended
for bulk-data-operations.  I can see the use-case for wanting SELECT
results to be compressed, but it strikes me as the 10% case for PG users
rather than the 90% one.  Ditto for COPY vs. large INSERT .. VALUES.

Compressing every small packet seems like it'd be overkill and might
surprise people by actually reducing performance in the case of lots of
small requests.

It also strikes me as a bit silly to do something like:

zcat myfile.gz | \
psql -Z -c "COPY mytable FROM STDIN;"

> Just to not look like I'm rejecting stuff without proposing
> alternatives, here is an idea about a backwards-compatible design for
> doing that: we could add an option that can be set in the connection
> request packet.  Say, "transmission_compression = gzip".

Alright, do we want/need to support multiple options there?  What do
people think we should support?  Any other particular details or issues
that come to mind with such an implementation?

I'm willing to work through that if it's the route everyone agrees with.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
I mentioned last year that I wanted to start working on parallelism:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Years ago I added thread-safety to libpq.  Recently I added two parallel
execution paths to pg_upgrade.  The first parallel path allows execution
of external binaries pg_dump and psql (to restore).  The second parallel
path does copy/link by calling fork/thread-safe C functions.  I was able
to do each in 2-3 days.

I believe it is time to start adding parallel execution to the backend. 
We already have some parallelism in the backend:
effective_io_concurrency and helper processes.  I think it is time we
start to consider additional options.

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries.  The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

I have summarized my ideas by updating our Parallel Query Execution wiki
page: 

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Please consider updating the page yourself or posting your ideas to this
thread.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables vs. GIST

2013-01-15 Thread Robert Haas
On Tue, Jan 15, 2013 at 4:26 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I think that might be acceptable from a performance point of view -
>> after all, if the index is unlogged, you're saving the cost of WAL -
>> but I guess I still prefer a generic solution to this problem (a
>> generalization of GetXLogRecPtrForTemp) rather than a special-purpose
>> solution based on the nitty-gritty of how GiST uses these values.
>> What's the difference between storing this value in pg_control and,
>> say, the OID counter?
>
> Well, the modularity argument is that GiST shouldn't have any special
> privileges compared to a third-party index AM.  (I realize that
> third-party AMs already have problems plugging into WAL replay, but
> that doesn't mean we should create more problems.)
>
> We could possibly dodge that objection by regarding the global counter
> as some sort of generic "unlogged operation counter", available to
> anybody who needs it.  It would be good to have a plausible example of
> something else needing it, but assume somebody can think of one.
>
> The bigger issue is that the reason we don't have to update pg_control
> every other millisecond is that the OID counter is capable of tracking
> its state between checkpoints without touching pg_control, that is it
> can emit WAL records to track its increments.  I think that we should
> insist that GiST do likewise, even if we give it some space in
> pg_control.  Remember that pg_control is a single point of failure for
> the database, and the more often it's written to, the more likely it is
> that something will go wrong there.
>
> So I guess what would make sense to me is that we invent an "unlogged
> ops counter" that is managed exactly like the OID counter, including
> having WAL records that are treated as consuming some number of values
> in advance.  If it's 64 bits wide then the WAL records could safely be
> made to consume quite a lot of values, like a thousand or so, thus
> reducing the actual WAL I/O burden to about nothing.

I didn't look at the actual patch (silly me?) but the only time you
need to update the control file is when writing the shutdown
checkpoint just before stopping the database server.  If the server
crashes, it's OK to roll the value back to some smaller value, because
unlogged relations will be reset anyway.  And while the server is
running the information can live in a shared memory copy protected by
a spinlock.  So the control file traffic should be limited to once per
server lifetime, AFAICS.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Tom Lane
Stephen Frost  writes:
> * Bruce Momjian (br...@momjian.us) wrote:
>> And this leads to support-my-compression-binary-of-the-day mess.  Why
>> not just allow them to do '|compression-binary'?

> The popen patch doesn't support the '|compression-binary' option through
> the FE protocol.  Even if it did, it would only be available for
> superusers as we can't allow regular users to run arbitrary commands on
> the server-side.

I find the argument that this supports compression-over-the-wire to be
quite weak, because COPY is only one form of bulk data transfer, and
one that a lot of applications don't ever use.  If we think we need to
support transmission compression for ourselves, it ought to be
integrated at the wire protocol level, not in COPY.

Just to not look like I'm rejecting stuff without proposing
alternatives, here is an idea about a backwards-compatible design for
doing that: we could add an option that can be set in the connection
request packet.  Say, "transmission_compression = gzip".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json api WIP patch

2013-01-15 Thread Gavin Flower

On 16/01/13 08:04, David Fetter wrote:

On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:

On 01/14/2013 07:36 PM, Merlin Moncure wrote:

While testing this I noticed that integer based 'get' routines are
zero based -- was this intentional?  Virtually all other aspects of
SQL are 1 based:

postgres=# select json_get('[1,2,3]', 1);
  json_get
--
  2
(1 row)

postgres=# select json_get('[1,2,3]', 0);
  json_get
--
  1
(1 row)

Yes. it's intentional. SQL arrays might be 1-based by default, but
JavaScript arrays are not. JsonPath and similar gadgets treat the
arrays as zero-based. I suspect the Json-using community would not
thank us for being overly SQL-centric on this - and I say that as
someone who has always thought zero based arrays were a major design
mistake, responsible for countless off-by-one errors.

Perhaps we could compromise by making arrays 0.5-based.

Cheers,
David.
I think that is far to rational, perhaps the reciprocal of the golden 
ratio(0.618033...) would be more appropriate?


I used to be insistent that arrays should start with 1, now I find 
starting at 0 far more natural - because evrytime you start an array at 
1, the computer has to subtract 1 in order to calculate the entry.  Also 
both Java & C are zero based.


I first learnt FORTRAN IV which is 1 based, had a shock when I was 
learning Algol and found it was 0 based - many moons ago...



Cheers,
Gavin


Re: [HACKERS] json api WIP patch

2013-01-15 Thread David E. Wheeler
On Jan 15, 2013, at 12:17 PM, Andrew Dunstan  wrote:

> I doubt I'm very representative either. People like David Wheeler, Taras 
> Mitran, Joe Van Dyk, and the Heroku guys would be better people to ask than 
> me. I'm quite prepared to change it if that's the consensus.

They’re JSON arrays, not SQL arrays, and JSON arrays are based on JavaScript, 
where they are 0-based.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] BUG #7809: Running pg_dump on slave w/ streaming replication fails if there are unlogged tables

2013-01-15 Thread Magnus Hagander
On Tue, Jan 15, 2013 at 12:13 AM,   wrote:
> The following bug has been logged on the website:
>
> Bug reference:  7809
> Logged by:  Joe Van Dyk
> Email address:  j...@tanga.com
> PostgreSQL version: 9.2.2
> Operating system:   Ubuntu
> Description:
>
> Running pg_dump on a streaming replication slave with a database that has
> unlogged_tables will fail unless you provide the "--no-unlogged-table-data"
> option with the following (scary) error:
>
> pg_dump: Dumping the contents of table "tracking_import_data" failed:
> PQgetResult() failed.
> pg_dump: Error message from server: ERROR:  could not open file
> "base/16388/190326": No such file or directory
> pg_dump: The command was: COPY public.tracking_import_data (uuid,
> tracking_number) TO stdout;
>
> (this guy  encountered the error as well:
> http://www.postgresql.org/message-id/de2de764-307d-4a23-a9a9-6608ac097...@ticketevolution.com
> )
>
> Could running pg_dump against a slave always use the
> "--no-unlogged-table-data" option?

That sounds like a pretty reasonable idea, I think. Should be easy
enough to figure out at an early stage, too.

That said, it wouldn't hurt if we could make that error a little less
scary. Instead of saying "could not open file", could we find a way to
say "this is an unlogged table on a slave, it's not going to work"?

We can fix pg_dump the easy way, but what about custom tools...


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables vs. GIST

2013-01-15 Thread Tom Lane
Robert Haas  writes:
> I think that might be acceptable from a performance point of view -
> after all, if the index is unlogged, you're saving the cost of WAL -
> but I guess I still prefer a generic solution to this problem (a
> generalization of GetXLogRecPtrForTemp) rather than a special-purpose
> solution based on the nitty-gritty of how GiST uses these values.
> What's the difference between storing this value in pg_control and,
> say, the OID counter?

Well, the modularity argument is that GiST shouldn't have any special
privileges compared to a third-party index AM.  (I realize that
third-party AMs already have problems plugging into WAL replay, but
that doesn't mean we should create more problems.)

We could possibly dodge that objection by regarding the global counter
as some sort of generic "unlogged operation counter", available to
anybody who needs it.  It would be good to have a plausible example of
something else needing it, but assume somebody can think of one.

The bigger issue is that the reason we don't have to update pg_control
every other millisecond is that the OID counter is capable of tracking
its state between checkpoints without touching pg_control, that is it
can emit WAL records to track its increments.  I think that we should
insist that GiST do likewise, even if we give it some space in
pg_control.  Remember that pg_control is a single point of failure for
the database, and the more often it's written to, the more likely it is
that something will go wrong there.

So I guess what would make sense to me is that we invent an "unlogged
ops counter" that is managed exactly like the OID counter, including
having WAL records that are treated as consuming some number of values
in advance.  If it's 64 bits wide then the WAL records could safely be
made to consume quite a lot of values, like a thousand or so, thus
reducing the actual WAL I/O burden to about nothing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json api WIP patch

2013-01-15 Thread Daniel Farina
On Tue, Jan 15, 2013 at 12:17 PM, Andrew Dunstan  wrote:
>
> On 01/15/2013 02:47 PM, Merlin Moncure wrote:
>>
>> On Tue, Jan 15, 2013 at 1:04 PM, David Fetter  wrote:
>>>
>>> On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:

 On 01/14/2013 07:36 PM, Merlin Moncure wrote:
>
> While testing this I noticed that integer based 'get' routines are
> zero based -- was this intentional?  Virtually all other aspects of
> SQL are 1 based:
>
> postgres=# select json_get('[1,2,3]', 1);
>   json_get
> --
>   2
> (1 row)
>
> postgres=# select json_get('[1,2,3]', 0);
>   json_get
> --
>   1
> (1 row)

 Yes. it's intentional. SQL arrays might be 1-based by default, but
 JavaScript arrays are not. JsonPath and similar gadgets treat the
 arrays as zero-based. I suspect the Json-using community would not
 thank us for being overly SQL-centric on this - and I say that as
 someone who has always thought zero based arrays were a major design
 mistake, responsible for countless off-by-one errors.
>>>
>>> Perhaps we could compromise by making arrays 0.5-based.
>>
>> Well, I'm not prepared to argue with Andrew in this one.  It was
>> surprising behavior to me, but that's sample size one.
>
> I doubt I'm very representative either. People like David Wheeler, Taras
> Mitran, Joe Van Dyk, and the Heroku guys would be better people to ask than
> me. I'm quite prepared to change it if that's the consensus.

Hello.

I'm inclined to go with the same gut feeling you had (zero-based-indexing).

Here is the background for my reasoning:

The downside of zero-based-indexing is that people who want to use
multiple sequential container types will inevitably have to deal with
detailed and not easily type-checked integer coordinates that mean
different things in each domain that will, no doubt, lead to a number
of off-by-one errors.  Nevertheless, this cost is already paid because
one of the first things many people will do in programs generating SQL
queries is try to zero-index a SQL array, swear a bit after figuring
things out (because a NULL will be generated, not an error), and then
adjust all the offsets. So, this is not a new problem.  On many
occasions I'm sure this has caused off-by-one bugs, or the NULLs
slipped through testing and delivered funny results, yet the world
moves on.

On the other hand, the downside of going down the road of 1-based
indexing and attempting to attain relative sameness to SQL arrays, it
would also feel like one would be obliged to implement SQL array
infelicities like 'out of bounds' being SQL NULL rather than an error,
related to other spectres like non-rectangular nested arrays.  SQL
array semantics are complex and The Committee can change them or --
slightly more likely -- add interactions, so it seems like a general
expectation that Postgres container types that happen to have any
reasonable ordinal addressing will implement some level of same-ness
with SQL arrays is a very messy one.  As such, if it becomes customary
to implement one-based indexing of containers, I think such customs
are best carefully circumscribed so that attempts to be 'like' SQL
arrays are only as superficial as that.

What made me come down on the side of zero-based indexing in spite of
the weaknesses are these two reasons:

* The number of people who use JSON and zero-based-indexing is very
  large, and furthermore, within that set the number that know that
  SQL even defines array support -- much less that Postgres implements
  it -- is much smaller. Thus, one is targeting cohesion with a fairly
  alien concept that is not understood by the audience.

* Maintaining PL integrated code that uses both 1-based indexing in PG
  functions and 0-based indexing in embedded languages that are likely
  to be combined with JSON -- doesn't sound very palatable, and the
  use of such PLs (e.g. plv8) seems pretty likely, too.  That can
  probably be a rich source of bugs and frustration.

If one wants SQL array semantics, it seems like the right way to get
them is coercion to a SQL array value.  Then one will receive SQL
array semantics exactly.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
> Well, COPY is super-user only, so it seems only useful for FDW, no?  We
> already have lots of user-configuration FDW commands, so I can see
> adding this one too.

COPY is most certainly not superuser-only..  COPY w/ popen, if that
popen can call anything, would certainly have to be superuser-only.

COPY TO STDOUT / FROM STDIN is available to and used a huge amount by
non-superusers.  Would be great if we could allow that to work with
compressed data also, imv.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Stephen Frost
* Christopher Browne (cbbro...@gmail.com) wrote:
> How about having a "pg_filters" table in pg_catalog which allows capturing
> labels and names of known-to-be-safe binary filters:

I was considering that (though I was thinking they'd be
"transformations" rather than filters; filter implies that you're
removing something, imv), but as I mentioned upthread, there are dangers
in that direction and having a default set of options strikes me as a
lot more challenging to provide.

> insert into pg_filters (label, location)
> values
> ('zcat', '/usr/bin/zcat'),
> ('bzip2', '/usr/bin/bzip2'),
> ('bunzip2', '/usr/bin/bunzip2');

We'd need to include which direction is supported also, I think.

> And then having some capability to grant permissions to roles to use
> these filters.

Yes, an additional ACL system, as I mentioned upthread, would be
required for this.

> That's not a "version 1" capability...  Suppose we have, in 9.3, that there 
> are
> direct references to "|/usr/bin/zcat" (and such), and then hope, in
> 9.4, to tease
> this out to be a non-superuser-capable facility via the above pg_filters?

It would be good to flush out what the syntax, etc, would look like for
this, if we're going to support it, before we go down a road that limits
us in what we can do.  For example, if we implement the existing popen
call, and then later want to allow non-superusers to use certain
filters, how would the non-superuser specify the filter?  I really don't
think we want to be taking the shell-like command provided by a
non-superuser and then try to match that against a list of commands in a
table..

> These filters should be useful for FDWs as well as for COPY.

I'm not sure I see how any FDW beyond file_fdw would really benefit from
this..?  I don't think a MySQL FDW or Reddis FDW would gain anything...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] unlogged tables vs. GIST

2013-01-15 Thread Robert Haas
On Tue, Jan 15, 2013 at 1:58 PM, Heikki Linnakangas
 wrote:
>> I think what Heikki had in mind was that the copy in the index would be
>> the authoritative one, not some image in shared memory.  This'd imply
>> dirtying the root page on every insert, as well as increased contention
>> for the root page, so it might have performance problems.
>
> Not every insert, just every split. Which might still be a performance
> problem, but an order of magnitude smaller.

I think that might be acceptable from a performance point of view -
after all, if the index is unlogged, you're saving the cost of WAL -
but I guess I still prefer a generic solution to this problem (a
generalization of GetXLogRecPtrForTemp) rather than a special-purpose
solution based on the nitty-gritty of how GiST uses these values.
What's the difference between storing this value in pg_control and,
say, the OID counter?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ctl idempotent option

2013-01-15 Thread Josh Berkus
On 01/15/2013 07:55 AM, Peter Eisentraut wrote:
> On 1/14/13 10:22 AM, Tom Lane wrote:
>> Idempotent is a ten-dollar word.  Can we find something that average
>> people wouldn't need to consult a dictionary to understand?
> 
> My suggestion in the original thread was --oknodo, but people didn't
> like that either.

I'm pretty sure that Oknodo is an island in the Pacific.  ;-)

I don't have a better idea for a name, though.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sepgsql 3/3] Add db_procedure:execute permission checks

2013-01-15 Thread Kohei KaiGai
This patch adds sepgsql support for permission checks almost
equivalent to the existing FUNCTION EXECUTE privilege.

This feature is constructed on new OAT_FUNCTION_EXEC event
type being invoked around pg_proc_aclcheck() except for cases
when function's permissions are checked during CREATE or
ALTER commands. (Extension can handle these cases on
OAT_POST_CREATE or OAT_POST_ALTER hooks if needed.)

This patch assumes db_schema:{search} patch is applied on top.
So, please also check the patches below...
https://commitfest.postgresql.org/action/patch_view?id=1003
https://commitfest.postgresql.org/action/patch_view?id=1065

Thanks,
-- 
KaiGai Kohei 


sepgsql-v9.3-function-execute-permission.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] count(*) of zero rows returns 1

2013-01-15 Thread Robert Haas
On Tue, Jan 15, 2013 at 2:26 PM, Shaun Thomas  wrote:
> Let's see what EnterpriseDB produces:
>
> test=> select * from dual;
>
>  dummy
> ---
>  X
> (1 row)

Hey, don't blame us.  We didn't come up with this bad idea ... just
trying to make life easier for those who are used to it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Bruce Momjian
On Tue, Jan 15, 2013 at 03:37:07PM -0500, Christopher Browne wrote:
> On Tue, Jan 15, 2013 at 2:53 PM, Stephen Frost  wrote:
> > The popen patch doesn't support the '|compression-binary' option through
> > the FE protocol.  Even if it did, it would only be available for
> > superusers as we can't allow regular users to run arbitrary commands on
> > the server-side.
> 
> That points towards a fix that involves having a set of non-arbitrary commands
> that we allow plain users to use.
> 
> Hmm.  There's an interesting thought...
> 
> How about having a "pg_filters" table in pg_catalog which allows capturing
> labels and names of known-to-be-safe binary filters:
> 
> insert into pg_filters (label, location)
> values
> ('zcat', '/usr/bin/zcat'),
> ('bzip2', '/usr/bin/bzip2'),
> ('bunzip2', '/usr/bin/bunzip2');
> 
> And then having some capability to grant permissions to roles to use
> these filters.
> 
> That's not a "version 1" capability...  Suppose we have, in 9.3, that there 
> are
> direct references to "|/usr/bin/zcat" (and such), and then hope, in
> 9.4, to tease
> this out to be a non-superuser-capable facility via the above pg_filters?
> 
> These filters should be useful for FDWs as well as for COPY.

Well, COPY is super-user only, so it seems only useful for FDW, no?  We
already have lots of user-configuration FDW commands, so I can see
adding this one too.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-01-15 Thread Robert Haas
On Tue, Jan 15, 2013 at 3:22 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Or perhaps there is some other way to make sure that the user "really
>> meant it", like refusing to create in pg_catalog unless the schema
>> name is given explicitly.  I kind of like that idea, actually.
>
> That does seem attractive at first glance.  Did you have an
> implementation in mind?  The idea that comes to mind for me is to hack
> namespace.c, either to prevent activeCreationNamespace from getting set
> to "pg_catalog" in the first place, or to throw error in
> LookupCreationNamespace and friends.  I am not sure though if
> LookupCreationNamespace et al ever get called in contexts where no
> immediate object creation is intended (and thus maybe an error wouldn't
> be appropriate).

As far as I can see, the principle place we'd want to hack would be
recomputeNamespacePath(), so that activeCreationNamespace never ends
up pointing to pg_catalog even if that's explicitly listed in
search_path.  The places where we actually work out what schema to use
are RangeVarGetCreationNamespace() and
QualifiedNameGetCreationNamespace(), but those don't seem like they'd
need any adjustment, unless perhaps we wish to whack around the "no
schema has been selected to create in" error message in some way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Christopher Browne
On Tue, Jan 15, 2013 at 2:53 PM, Stephen Frost  wrote:
> The popen patch doesn't support the '|compression-binary' option through
> the FE protocol.  Even if it did, it would only be available for
> superusers as we can't allow regular users to run arbitrary commands on
> the server-side.

That points towards a fix that involves having a set of non-arbitrary commands
that we allow plain users to use.

Hmm.  There's an interesting thought...

How about having a "pg_filters" table in pg_catalog which allows capturing
labels and names of known-to-be-safe binary filters:

insert into pg_filters (label, location)
values
('zcat', '/usr/bin/zcat'),
('bzip2', '/usr/bin/bzip2'),
('bunzip2', '/usr/bin/bunzip2');

And then having some capability to grant permissions to roles to use
these filters.

That's not a "version 1" capability...  Suppose we have, in 9.3, that there are
direct references to "|/usr/bin/zcat" (and such), and then hope, in
9.4, to tease
this out to be a non-superuser-capable facility via the above pg_filters?

These filters should be useful for FDWs as well as for COPY.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Curious buildfarm failures (fwd)

2013-01-15 Thread Sergey Koposov

On Tue, 15 Jan 2013, Tom Lane wrote:


BTW, I just finished trying to reproduce this on an IA64 machine
belonging to Red Hat, without success.  So that seems to eliminate
any possibility of the machine architecture being the trigger issue.
The compiler's still a likely cause though.
Anybody have a similar ICC version (dugong's says it is 10.0 20070809)
to try?  Also, Sergey, can you find a non-dot-zero release to try?


I think it is indeed the main issue.
I've tried 10.1 ( 10.1.011 ) and it doesn't fail.

When I tried 11.1 (icc (ICC) 11.1 20100401 ) it failed in a quite strange 
way (I don't quite remember it happenning before)


test tablespace   ... ok
parallel group (18 tests):  txid int2 text name oid varchar int4 char money
float8 uuid float4 int8 boolean bit enum numeric rangetypes
 boolean  ... ok
 char ... ok
 name ... ok
 varchar  ... ok
 text ... ok
 int2 ... ok
 int4 ... ok
 int8 ... ok
 oid  ... ok
 float4   ... ok
 float8   ... ok
 bit  ... ok
 numeric  ... ok
 txid ... ok
 uuid ... ok
 enum ... ok
 money... ok
 rangetypes   ... FAILED (test process exited with exit code
 2)
test strings  ... FAILED (test process exited with exit code
2)
test numerology   ... FAILED (test process exited with exit code
2)
parallel group (19 tests):  path interval time inet circle macaddr comments
timestamp timestamptz reltime date tstypes tinterval abstime timetz lseg box
polygon point
 point... FAILED (test process exited with exit code
2)
 lseg ... FAILED (test process exited with exit code
2)
 box  ... FAILED (test process exited with exit code
2)
 path ... FAILED (test process exited with exit code
2)
 polygon  ... FAILED (test process exited with exit code
2)
 circle   ... FAILED (test process exited with exit code
2)
 date ... FAILED (test process exited with exit code
2)
 time ... FAILED (test process exited with exit code
2)
 timetz   ... FAILED (test process exited with exit code
2)
 timestamp... FAILED (test process exited with exit code
2)
 timestamptz  ... FAILED (test process exited with exit code
2)
 interval ... FAILED (test process exited with exit code
2)
 abstime  ... FAILED (test process exited with exit code
2)
 reltime  ... FAILED (test process exited with exit code
2)
 tinterval... FAILED (test process exited with exit code
2)
 inet ... FAILED (test process exited with exit code
2)
 macaddr  ... FAILED (test process exited with exit code
2)
 tstypes  ... FAILED (test process exited with exit code
2)
 comments ... FAILED (test process exited with exit code
2)
parallel group (6 tests):  geometry regex horology type_sanity oidjoins
opr_sanity
 geometry ... FAILED
 horology ... FAILED
 regex... ok
 oidjoins ... ok
 type_sanity  ... ok
 opr_sanity   ... ok
test insert   ... ok
test create_function_1... ok
test create_type  ... ok
test create_table ... ok
test create_function_2... ok
parallel group (2 tests):  copyselect copy
 copy ... ok
 copyselect   ... ok
parallel group (2 tests):  create_operator create_misc
 create_misc  ... ok
 create_operator  ... ok
parallel group (2 tests):  create_view create_index
 create_index ... FAILED (test process exited with exit code
2)
 create_view  ... ok
parallel group (11 tests):  constraints triggers create_cast
create_function_3 updatable_views inherit drop_if_exists create_aggregate
create_table_like typed_table vacuum
 create_aggregate ... FAILED (test process exited with exit code
2)
 create_function_3... FAILED (test process exited with exit code
2)
 create_cast  ... FAILED (test process exited with exit code
2)
 constraints  ... FAILED (test process exited with exit code
2)
 triggers ... FAILED (test process exited with exit code
2)
 inherit  ... FAILED (test process exited with exit code
2)
 create_table_like... FAILED (test process exite

Re: [HACKERS] erroneous restore into pg_catalog schema

2013-01-15 Thread Tom Lane
Robert Haas  writes:
> Or perhaps there is some other way to make sure that the user "really
> meant it", like refusing to create in pg_catalog unless the schema
> name is given explicitly.  I kind of like that idea, actually.

That does seem attractive at first glance.  Did you have an
implementation in mind?  The idea that comes to mind for me is to hack
namespace.c, either to prevent activeCreationNamespace from getting set
to "pg_catalog" in the first place, or to throw error in
LookupCreationNamespace and friends.  I am not sure though if
LookupCreationNamespace et al ever get called in contexts where no
immediate object creation is intended (and thus maybe an error wouldn't
be appropriate).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json api WIP patch

2013-01-15 Thread Andrew Dunstan


On 01/15/2013 02:47 PM, Merlin Moncure wrote:

On Tue, Jan 15, 2013 at 1:04 PM, David Fetter  wrote:

On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:

On 01/14/2013 07:36 PM, Merlin Moncure wrote:

While testing this I noticed that integer based 'get' routines are
zero based -- was this intentional?  Virtually all other aspects of
SQL are 1 based:

postgres=# select json_get('[1,2,3]', 1);
  json_get
--
  2
(1 row)

postgres=# select json_get('[1,2,3]', 0);
  json_get
--
  1
(1 row)

Yes. it's intentional. SQL arrays might be 1-based by default, but
JavaScript arrays are not. JsonPath and similar gadgets treat the
arrays as zero-based. I suspect the Json-using community would not
thank us for being overly SQL-centric on this - and I say that as
someone who has always thought zero based arrays were a major design
mistake, responsible for countless off-by-one errors.

Perhaps we could compromise by making arrays 0.5-based.

Well, I'm not prepared to argue with Andrew in this one.  It was
surprising behavior to me, but that's sample size one.




I doubt I'm very representative either. People like David Wheeler, Taras 
Mitran, Joe Van Dyk, and the Heroku guys would be better people to ask 
than me. I'm quite prepared to change it if that's the consensus.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] Slow query: bitmap scan troubles

2013-01-15 Thread Tom Lane
Bruce Momjian  writes:
> On Mon, Jan 14, 2013 at 12:56:37PM -0500, Tom Lane wrote:
>> Remember also that "enable_seqscan=off" merely adds 1e10 to the
>> estimated cost of seqscans.  For sufficiently large tables this is not
>> exactly a hard disable, just a thumb on the scales.  But I don't know
>> what your definition of "extremely large indexes" is.

> Wow, do we need to bump up that value based on larger modern hardware?

I'm disinclined to bump it up very much.  If it's more than about 1e16,
ordinary cost contributions would disappear into float8 roundoff error,
causing the planner to be making choices that are utterly random except
for minimizing the number of seqscans.  Even at 1e14 or so you'd be
losing a lot of finer-grain distinctions.  What we want is for the
behavior to be "minimize the number of seqscans but plan normally
otherwise", so those other cost contributions are still important.

Anyway, at this point we're merely speculating about what's behind
Robert's report --- I'd want to see some concrete real-world examples
before changing anything.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-01-15 Thread Robert Haas
On Mon, Jan 14, 2013 at 2:07 PM, Alvaro Herrera
 wrote:
> Tom Lane escribió:
>> Alvaro Herrera  writes:
>
>> > alvherre=# create extension adminpack;
>> > ERROR:  permission denied for schema pg_catalog
>>
>> Um.  I knew that that module's desire to shove stuff into pg_catalog
>> would bite us someday.  But now that I think about it, I'm pretty sure
>> I recall discussions to the effect that there are other third-party
>> modules doing similar things.
>
> How about we provide a superuser-only function that an extension can
> call which will set enableSystemTableMods?  It would get back
> automatically to the default value on transaction end.  That way,
> extensions that wish to install stuff in pg_catalog can explicitely
> declare it, i, and the rest of the world enjoys consistent protection.

Or just document the existing GUC and make it something less than
PGC_POSTMASTER, like maybe PGC_SUSER.

But, really, I think allow_system_table_mods paints with too broad a
brush.  It allows both things that are relatively OK (like creating a
function in pg_catalog) and things that are rampantly insane (like
dropping a column from pg_proc).  It might be a good idea to make
those things controlled by two different switches.

Or perhaps there is some other way to make sure that the user "really
meant it", like refusing to create in pg_catalog unless the schema
name is given explicitly.  I kind of like that idea, actually.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-01-15 Thread Robert Haas
On Sun, Jan 13, 2013 at 4:09 PM, Tom Lane  wrote:
> Right, that is the argument for ignoring missing schemas, and I think it
> is entirely sensible for *search* activities.  But allowing *creation*
> to occur in an indeterminate schema is a horrid idea.

But the default search path is $user, public; and of those two, only
the latter exists by default.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-15 Thread Tom Lane
Robert Haas  writes:
> There are an awful lot of places in our source tree where the error
> level is fixed.  We could invent a new construct, say ereport_error or
> so, that is just like ereport except that it takes no error-level
> parameter because it's hard-coded to ERROR.

> It would be a bit of a pain to change all of the existing call sites,
> but presumably it would dodge a lot of these issues about the way
> compilers optimize things, because we could simply say categorically
> that ereport_error NEVER returns.

Meh.  We've already got it working, and in a way that doesn't require
the compiler to understand __attribute__((noreturn)) --- it only has
to be aware that abort() doesn't return, in one fashion or another.
So I'm disinclined to run around and change a few thousand call sites,
much less expect extension authors to do so too.

(By my count there are about six thousand places we'd have to change.)

Note that whatever's going on on dugong is not a counterexample to
"got it working", because presumably dugong would also be misbehaving
if we'd used a different method of flagging all the ereports/elogs
as nonreturning.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sepgsql 1/3] add name qualified creation label

2013-01-15 Thread Kohei KaiGai
This patch adds sepgsql the feature of name qualified creation label.

Background, on creation of a certain database object, sepgsql assigns
a default security label according to the security policy that has a set of
rules to determine a label of new object.
Usually, a new object inherits its parent (e.g table is a parent of column)
object's label, unless it has a particular type_transition rule in the policy.
Type_transition rule allows to describe a particular security label as
default label of new object towards a pair of client and parent object.
For example, the below rule says columns constructed under the table
labeled as "sepgsql_table_t" by client with "staff_t" will have
"staff_column_t", instead of table's label.
  TYPE_TRANSITION staff_t sepgsql_table_t:db_column staff_column_t;

Recently, this rule was enhanced to take 5th argument for object name;
that enables to special case handling exceptionally.
It was originally designed to describe default security labels for files in
/etc directory, because many application put its own configuration files
here, thus, traditional type_transition rule was poor to describe all the
needed defaults.
On the other hand, we can port this concept of database system also.
One example is temporary objects being constructed under the pg_temp
schema. If we could assign a special default label on this, it allows
unprivileged users (who cannot create persistent tables) to create
temporary tables that has no risk of information leak to other users.
Otherwise, we may be able to assign a special security label on
system columns and so on.

>From the perspective of implementation on sepgsql side, all we need
to do is replace old security_compute_create_raw() interface by new
security_compute_create_name_raw().
If here is no name qualified type_transition rules, it performs as if
existing API, so here is no backword compatible issue.

This patch can be applied on the latest master branch.

Thanks,
-- 
KaiGai Kohei 


sepgsql-v9.3-creation-label-with-name.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get current query in a trigger function

2013-01-15 Thread Robert Haas
On Mon, Jan 14, 2013 at 9:38 PM, Vlad Arkhipov  wrote:
>>> Is there any simple way of getting a query for which a trigger was
>>> executed?
>>> debug_query_string and ActivePortal->sourceText return the top query when
>>> there are nested triggers.
>>
>> I believe - only if the trigger is written in C.
>>
> Yes, the trigger is written in C. But I didn't find any way to get access to
> the current EState or QueryDesc from a trigger function. The only common
> place of a trigger and the corresponding EState/QueryDesc structs seems to
> be CurrentMemoryContext in a trigger function, which ancestor has to be (?)
> EState->es_query_cxt. It's an ugly solution of course.

Ah, I see.  Yeah, I don't know of a good way to do that, although
there may be one.

> P.S. Is it a mistype EState->es_query_cxt? Should it be es_query_ctx?

I don't see why either of those is particularly better than the other.
 "context" has a t both before and after the x.

...Robert

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
> On Tue, Jan 15, 2013 at 10:55:04AM -0500, Stephen Frost wrote:
> > * Peter Eisentraut (pete...@gmx.net) wrote:
> > > Operating on compressed files transparently in file_fdw is obviously
> > > useful, but why only gzip?
> > 
> > This isn't really an argument, imv.  It's only gzip *right this moment*
> > because that's all that I implemented.  I've already offered to add
> > bzip2 or whatever else people would like.
> 
> And this leads to support-my-compression-binary-of-the-day mess.  Why
> not just allow them to do '|compression-binary'?

You're right, to clarify, for *file_fdw*, which is a backend-only
operation, the popen patch is great (thought I made that clear before).

The popen patch doesn't support the '|compression-binary' option through
the FE protocol.  Even if it did, it would only be available for
superusers as we can't allow regular users to run arbitrary commands on
the server-side.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Re: [pgsql-cluster-hackers] Save The Date: Cluster-Hackers meeting May 21st

2013-01-15 Thread Josh Berkus

> Given last year we didn't have more than 30 mins to spend on logical
> replication, how much time will we have to discuss non-open source
> systems?

Mostly the idea is to discuss what some of the proprietary systems need
from core postgres, especially where it overlaps with what OSS systems
need and where the sponsoring companies are

> 
> Can I check whether the agenda of this meeting is open?
> 

Given that we haven't even put up a wiki page yet or set an hourly
schedule, it's completely open.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] passing diff options to pg_regress

2013-01-15 Thread Robert Haas
On Mon, Jan 14, 2013 at 6:57 AM, Peter Eisentraut  wrote:
> I sometimes find it useful to view a regression test difference using
> other diff options, such as -u -w or more context.  There is currently
> no easy way to accomplish that.
>
> I suggest allowing to override the diff options using an environment
> variable, such as PG_REGRESS_DIFF_OPTS.  The patch is very small.

Great idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json api WIP patch

2013-01-15 Thread Merlin Moncure
On Tue, Jan 15, 2013 at 1:04 PM, David Fetter  wrote:
> On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:
>>
>> On 01/14/2013 07:36 PM, Merlin Moncure wrote:
>> >While testing this I noticed that integer based 'get' routines are
>> >zero based -- was this intentional?  Virtually all other aspects of
>> >SQL are 1 based:
>> >
>> >postgres=# select json_get('[1,2,3]', 1);
>> >  json_get
>> >--
>> >  2
>> >(1 row)
>> >
>> >postgres=# select json_get('[1,2,3]', 0);
>> >  json_get
>> >--
>> >  1
>> >(1 row)
>>
>> Yes. it's intentional. SQL arrays might be 1-based by default, but
>> JavaScript arrays are not. JsonPath and similar gadgets treat the
>> arrays as zero-based. I suspect the Json-using community would not
>> thank us for being overly SQL-centric on this - and I say that as
>> someone who has always thought zero based arrays were a major design
>> mistake, responsible for countless off-by-one errors.
>
> Perhaps we could compromise by making arrays 0.5-based.

Well, I'm not prepared to argue with Andrew in this one.  It was
surprising behavior to me, but that's sample size one.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] Slow query: bitmap scan troubles

2013-01-15 Thread Bruce Momjian
On Mon, Jan 14, 2013 at 12:56:37PM -0500, Tom Lane wrote:
> > The reported behavior was that the planner would prefer to
> > sequential-scan the table rather than use the index, even if
> > enable_seqscan=off.  I'm not sure what the query looked like, but it
> > could have been something best implemented as a nested loop w/inner
> > index-scan.
> 
> Remember also that "enable_seqscan=off" merely adds 1e10 to the
> estimated cost of seqscans.  For sufficiently large tables this is not
> exactly a hard disable, just a thumb on the scales.  But I don't know
> what your definition of "extremely large indexes" is.

Wow, do we need to bump up that value based on larger modern hardware?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-15 Thread Robert Haas
On Sun, Jan 13, 2013 at 4:16 PM, Tom Lane  wrote:
> Andres Freund  writes:
>>> Basically, the aspects of this that I think are likely to be
>>> reproducible wins across different platforms are (a) teaching the
>>> compiler that elog(ERROR) doesn't return, and (b) reducing code size as
>>> much as possible.  The single-function change isn't going to help on
>>> either ground --- maybe it would have helped on (b) without the errno
>>> problem, but that's going to destroy any possible code size savings.
>
>> Agreed. I am happy to produce an updated patch unless youre already on
>> it?
>
> On it now (busy testing on some old slow boxes, else I'd be done already).

Just a random thought here...

There are an awful lot of places in our source tree where the error
level is fixed.  We could invent a new construct, say ereport_error or
so, that is just like ereport except that it takes no error-level
parameter because it's hard-coded to ERROR.

It would be a bit of a pain to change all of the existing call sites,
but presumably it would dodge a lot of these issues about the way
compilers optimize things, because we could simply say categorically
that ereport_error NEVER returns.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Curious buildfarm failures (fwd)

2013-01-15 Thread Tom Lane
Sergey Koposov  writes:
> And I do see the tblspc file left after the finish of "make check":
>   tmp_check/data/pg_tblspc/16385/PG_9.3_201212081/16384/16387

Interesting.  If the tests are run immediately after initdb, 16387
is the relfilenode assigned to table "foo" in the tablespace regression
test.  But why would only that table be left behind?  There are half
a dozen relations in that tablespace at the point of the DROP CASCADE.

BTW, I just finished trying to reproduce this on an IA64 machine
belonging to Red Hat, without success.  So that seems to eliminate
any possibility of the machine architecture being the trigger issue.
The compiler's still a likely cause though.

Anybody have a similar ICC version (dugong's says it is 10.0 20070809)
to try?  Also, Sergey, can you find a non-dot-zero release to try?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ctl idempotent option

2013-01-15 Thread Boszormenyi Zoltan

2013-01-15 20:28 keltezéssel, Bruce Momjian írta:

On Tue, Jan 15, 2013 at 10:25:23AM -0500, Tom Lane wrote:

Alvaro Herrera  writes:

Vik Reykja escribi�:

On Mon, Jan 14, 2013 at 4:22 PM, Tom Lane  wrote:

Idempotent is a ten-dollar word.  Can we find something that average
people wouldn't need to consult a dictionary to understand?

I disagree that we should dumb things down when the word means exactly what
we want and based on the rest of this thread is the only word or word
cluster that carries the desired meaning.

I'm not quite convinced that it means *exactly* what we want.  The
dictionary definition, according to my laptop's dictionary, is "denoting
an element of a set that is unchanged in value when multiplied or
otherwise operated on by itself".  I'm well aware that computer people
often use it to mean "an operation that doesn't change the system state
if the state is already what's wanted", but I think that's probably an
abuse of the mathematical usage.  And in any case, I'm not sure that
non-hackers would immediately recognize the term, nor be enlightened by
their dictionaries.  But ...

I have heard idempotent used several times by our folks, and I didn't
know what it meant either.  I figured it was a "strong item".  ;-)  I
just looked it up.


The math term used in Hungarian for idempotent is mirror translated
to "leave it in place". The term also has a slang usage for beating somebody 
up. ;-)

--
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ctl idempotent option

2013-01-15 Thread Bruce Momjian
On Tue, Jan 15, 2013 at 10:55:41AM -0500, Peter Eisentraut wrote:
> On 1/14/13 10:22 AM, Tom Lane wrote:
> > Idempotent is a ten-dollar word.  Can we find something that average
> > people wouldn't need to consult a dictionary to understand?
> 
> My suggestion in the original thread was --oknodo, but people didn't
> like that either.

That's Japanese for idempotent.  ;-)  LOL

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ctl idempotent option

2013-01-15 Thread Bruce Momjian
On Tue, Jan 15, 2013 at 10:25:23AM -0500, Tom Lane wrote:
> Alvaro Herrera  writes:
> > Vik Reykja escribi�:
> >> On Mon, Jan 14, 2013 at 4:22 PM, Tom Lane  wrote:
> >>> Idempotent is a ten-dollar word.  Can we find something that average
> >>> people wouldn't need to consult a dictionary to understand?
> 
> >> I disagree that we should dumb things down when the word means exactly what
> >> we want and based on the rest of this thread is the only word or word
> >> cluster that carries the desired meaning.
> 
> I'm not quite convinced that it means *exactly* what we want.  The
> dictionary definition, according to my laptop's dictionary, is "denoting
> an element of a set that is unchanged in value when multiplied or
> otherwise operated on by itself".  I'm well aware that computer people
> often use it to mean "an operation that doesn't change the system state
> if the state is already what's wanted", but I think that's probably an
> abuse of the mathematical usage.  And in any case, I'm not sure that
> non-hackers would immediately recognize the term, nor be enlightened by
> their dictionaries.  But ...

I have heard idempotent used several times by our folks, and I didn't
know what it meant either.  I figured it was a "strong item".  ;-)  I
just looked it up.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] count(*) of zero rows returns 1

2013-01-15 Thread Shaun Thomas

On 01/15/2013 01:18 PM, Bruce Momjian wrote:


AFAICR, 'dual' table has one column named 'DUMMY' and one row with
value, single character X.


How elegant.  :-(


Let's see what EnterpriseDB produces:

test=> select * from dual;

 dummy
---
 X
(1 row)

Yep, elegant gets my vote. ;) But then again, Oracle also created 
VARCHAR2 and told everyone to start using that, just in case they ever 
modified VARCHAR to be SQL compliant. Thankfully we have you guys, so PG 
won't go down a similar route.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY .. COMPRESSED

2013-01-15 Thread Bruce Momjian
On Tue, Jan 15, 2013 at 10:55:04AM -0500, Stephen Frost wrote:
> * Peter Eisentraut (pete...@gmx.net) wrote:
> > Operating on compressed files transparently in file_fdw is obviously
> > useful, but why only gzip?
> 
> This isn't really an argument, imv.  It's only gzip *right this moment*
> because that's all that I implemented.  I've already offered to add
> bzip2 or whatever else people would like.

And this leads to support-my-compression-binary-of-the-day mess.  Why
not just allow them to do '|compression-binary'?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] count(*) of zero rows returns 1

2013-01-15 Thread Bruce Momjian
On Mon, Jan 14, 2013 at 10:25:39PM -0500, Gurjeet Singh wrote:
> On Mon, Jan 14, 2013 at 3:09 PM, David Johnston  wrote:
> 
> What does "SELECT * FROM dual" in Oracle yield?
> 
> 
> AFAICR, 'dual' table has one column named 'DUMMY' and one row with value,
> single character X.

How elegant.  :-(

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] count(*) of zero rows returns 1

2013-01-15 Thread Bruce Momjian
On Mon, Jan 14, 2013 at 10:47:58PM -0500, Gurjeet Singh wrote:
> Interesting to note that SELECT * FROM table_with_zero_cols does not complain
> of anything.
>  
> postgres=# select * from test1;
> --
> (0 rows)
> 
> This I believe result of the fact that we allow user to drop all columns of a
> table.
> 
> On a side note, Postgres allows me to do this (which I don't think is a bug or
> useless): I inserted some rows into a table, and then dropped the columns. The
> resulting table has no columns, but live rows.
> 
> postgres=# select * from test_0_col_table ;
> --
> (20 rows)

I found this funny.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pkg-config files for libpq and ecpg

2013-01-15 Thread Peter Eisentraut
I'll take another stab at providing pkg-config files for the client-side
libraries.

The main reason this time around is that this works a lot better (or at
all) for multi-arch library installations.

Another is that pkg-config has become a lot smarter and flexible over
the years, and it's probably a better choice for users who are already
used to its interface.  There is a lot of confusion, for example, about
what pg_config --libs really means.  We often evade that by saying,
well, those are the libraries we linked with, but there is a lack of
clarity in that context about what libraries a user should link with.

The way it's implemented, it doesn't require manual maintenance, so it
should not be much of a bother.

A side issue that arose: libecpg_compat is linked with libpq, but
doesn't seem to use it.  This was added many years ago in
cd75f94dafd43358305811b7576ad75d889097e3, but it doesn't appear to be
required anymore.  Needs some checking.
diff --git a/.gitignore b/.gitignore
index 1e15ce5..b2854d9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -21,6 +21,7 @@ lcov.info
 win32ver.rc
 *.exe
 lib*dll.def
+lib*.pc
 
 # Local excludes in root directory
 /GNUmakefile
diff --git a/src/Makefile.shlib b/src/Makefile.shlib
index 4da2f10..d9728ba 100644
--- a/src/Makefile.shlib
+++ b/src/Makefile.shlib
@@ -87,6 +87,7 @@ shlib_bare= lib$(NAME)$(DLSUFFIX)
 # Testing the soname variable is a reliable way to determine whether a
 # linkable library is being built.
 soname = $(shlib_major)
+pkgconfigdir = $(libdir)/pkgconfig
 else
 # Naming convention for dynamically loadable modules
 shlib  = $(NAME)$(DLSUFFIX)
@@ -305,6 +306,7 @@ all-lib: all-shared-lib
 ifdef soname
 # no static library when building a dynamically loadable module
 all-lib: all-static-lib
+all-lib: lib$(NAME).pc
 endif
 
 all-static-lib: $(stlib)
@@ -388,6 +390,18 @@ $(stlib): $(shlib) $(DLL_DEFFILE) | $(SHLIB_PREREQS)
 endif # PORTNAME == cygwin || PORTNAME == win32
 
 
+%.pc: $(MAKEFILE_LIST)
+   echo 'Name: lib$(NAME)' >$@
+   echo 'Description: PostgreSQL lib$(NAME) library' >>$@
+   echo 'Url: http://www.postgresql.org/' >>$@
+   echo 'Version: $(VERSION)' >>$@
+   echo 'Requires: ' >>$@
+   echo 'Requires.private: $(PKG_CONFIG_REQUIRES_PRIVATE)' >>$@
+   echo 'Cflags: -I$(includedir)' >>$@
+   echo 'Libs: -L$(libdir) -l$(NAME)' >>$@
+   echo 'Libs.private: $(filter-out 
$(PKG_CONFIG_REQUIRES_PRIVATE:lib%=-l%),$(filter-out -L..%, $(SHLIB_LINK)))' 
>>$@
+
+
 # We need several not-quite-identical variants of .DEF files to build
 # DLLs for Windows.  These are made from the single source file
 # exports.txt.  Since we can't assume that Windows boxes will have
@@ -430,8 +444,12 @@ endif # SHLIB_EXPORTS
 install-lib: install-lib-shared
 ifdef soname
 install-lib: install-lib-static
+install-lib: install-lib-pc
 endif
 
+install-lib-pc: lib$(NAME).pc installdirs-lib
+   $(INSTALL_DATA) $< '$(DESTDIR)$(pkgconfigdir)/lib$(NAME).pc'
+
 install-lib-static: $(stlib) installdirs-lib
$(INSTALL_STLIB) $< '$(DESTDIR)$(libdir)/$(stlib)'
 ifeq ($(PORTNAME), darwin)
@@ -467,7 +485,7 @@ endif
 
 installdirs-lib:
 ifdef soname
-   $(MKDIR_P) '$(DESTDIR)$(libdir)'
+   $(MKDIR_P) '$(DESTDIR)$(libdir)' '$(DESTDIR)$(pkgconfigdir)'
 else
$(MKDIR_P) '$(DESTDIR)$(pkglibdir)'
 endif
@@ -483,7 +501,8 @@ ifdef soname
rm -f '$(DESTDIR)$(libdir)/$(stlib)'
rm -f '$(DESTDIR)$(libdir)/$(shlib_bare)' \
  '$(DESTDIR)$(libdir)/$(shlib_major)' \
- '$(DESTDIR)$(libdir)/$(shlib)'
+ '$(DESTDIR)$(libdir)/$(shlib)' \
+ '$(DESTDIR)$(pkgconfigdir)/lib$(NAME).pc'
 else # no soname
rm -f '$(DESTDIR)$(pkglibdir)/$(shlib)'
 endif # no soname
@@ -495,7 +514,7 @@ endif # no soname
 
 .PHONY: clean-lib
 clean-lib:
-   rm -f $(shlib) $(shlib_bare) $(shlib_major) $(stlib) $(exports_file)
+   rm -f $(shlib) $(shlib_bare) $(shlib_major) $(stlib) $(exports_file) 
lib$(NAME).pc
 
 ifneq (,$(SHLIB_EXPORTS))
 maintainer-clean-lib:
diff --git a/src/interfaces/ecpg/compatlib/Makefile 
b/src/interfaces/ecpg/compatlib/Makefile
index 00adcdb..6ae1493 100644
--- a/src/interfaces/ecpg/compatlib/Makefile
+++ b/src/interfaces/ecpg/compatlib/Makefile
@@ -18,10 +18,10 @@ SO_MAJOR_VERSION= 3
 SO_MINOR_VERSION= 5
 
 override CPPFLAGS := -I../include -I$(top_srcdir)/src/interfaces/ecpg/include \
-   -I$(libpq_srcdir) -I$(top_srcdir)/src/include/utils $(CPPFLAGS)
+   -I$(top_srcdir)/src/include/utils $(CPPFLAGS)
 override CFLAGS += $(PTHREAD_CFLAGS)
 
-SHLIB_LINK = -L../ecpglib -lecpg -L../pgtypeslib -lpgtypes $(libpq) \
+SHLIB_LINK = -L../ecpglib -lecpg -L../pgtypeslib -lpgtypes \
$(filter -lintl -lm, $(LIBS)) $(PTHREAD_LIBS)
 SHLIB_PREREQS = submake-ecpglib submake-pgtypeslib
 
@@ -32,6 +32,8 @@ LIBS := $(filter-out -lpgport, $(LIBS))
 
 OBJS= informix.o $(filter snprintf.o, $(LIBOBJS))
 
+PKG_CONFIG_REQUIRES_PRIVATE = libecpg libpgtypes
+
 all: all-lib
 
 .

  1   2   >