Re: [HACKERS] Avoiding shutdown checkpoint at failover

2011-11-13 Thread Simon Riggs
On Tue, Nov 1, 2011 at 12:11 PM, Simon Riggs si...@2ndquadrant.com wrote:

 When I say skip the shutdown checkpoint, I mean remove it from the
 critical path of required actions at the end of recovery. We can still
 have a normal checkpoint kicked off at that time, but that no longer
 needs to be on the critical path.

 Any problems foreseen? If not, looks like a quick patch.

Patch attached for discussion/review.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


fast_failover.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] MPI programming in postgreSQL backend source code

2011-11-13 Thread Rudyar Cortés

Hello,

I'm a new programmer in postgreSQL source code..
Is possible use MPI functions in postgreSQL source code?

Help me please!

Best Regards.

Rudyar.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [patch] Include detailed information about a row failing a CHECK constraint into the error message

2011-11-13 Thread Jan Kundrát
On 11/10/11 00:48, José Arthur Benetasso Villanova wrote:
 First, I couldn't apply it as in the email, even in REL9_0_STABLE: the
 offset doesn't look right. Which commit are your repository in?

Hi Jose, thanks for looking at the patch. It's based on
b07b2bdc570cfbb39564c8a70783dbce1edcb3d6, which was REL9_0_STABLE at the
time I made the change.

Cheers,
Jan



smime.p7s
Description: S/MIME Cryptographic Signature


[HACKERS] BuildFarm - Jaguar Check Failure

2011-11-13 Thread Mehdi Maache

Hi,

I don't know if you know but in case :

I have jaguar in check failure ( 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=jaguardt=2011-11-03%2023%3A05%3A01 
) since 03-11-2011 and I don't know what is it.


I built in other system with --nosend and it seems I have the same problem :

test case : rangetypes   ... FAILED

regards,

Mehdi MAACHE

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [patch] Include detailed information about a row failing a CHECK constraint into the error message

2011-11-13 Thread Jan Kundrát
Hi José and Robert, thanks for your time and a review. Comments below.

On 11/10/11 03:47, Robert Haas wrote:
 It does this already, without this patch.  This patch is about CHECK
 constraints, not UNIQUE ones.

That's right. This is how to check what the patch changes:

jkt= CREATE TABLE tbl (name TEXT PRIMARY KEY, a INTEGER CHECK (a0));
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
tbl_pkey for table tbl
CREATE TABLE
jkt= INSERT INTO tbl (name, a) VALUES ('x', 10);
INSERT 0 1
jkt= UPDATE tbl SET a = -a;
ERROR:  new row for relation tbl violates check constraint tbl_a_check
DETAIL:  New row with data (x, -10) violates check constraint tbl_a_check.

The last line, the detailed error message, is added by the patch.

 I believe we've previously rejected patches along these lines on the
 grounds that the output could realistically be extremely long.
 Imagine that you have a table with an integer primary key column and a
 text column.  The integer column has a check constraint on it.  But
 the text column might contain a kilobyte, or a megabyte, or even a
 gigabyte worth of text, and we don't necessarily want to spit that all
 out on an error.  For unique constraints, we only emit the values that
 are part of the constraint, which in most cases will be relatively
 short (if they're more than 8kB, they won't fit into an index block);
 but here the patch wants to dump the whole tuple, and that could be
 really big.

That's an interesting thought. I suppose the same thing is an issue with
unique keys, but they tend to not be created over huge columns, so it
isn't really a problem, right?

Would you object to a patch which outputs just the first 8kB of each
column? Having at least some form of context is very useful in my case.

(And as a side note, I'm not really familiar with Postgres' internals,
so it took me roughly six hours to arrive to a working patch from the
very start. I'd therefore welcome pointers about the best way to achieve
that with Postgres' string stream interface.)

With kind regards,
Jan

-- 
Trojita, a fast e-mail client -- http://trojita.flaska.net/



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Re: [patch] Include detailed information about a row failing a CHECK constraint into the error message

2011-11-13 Thread Pavel Stehule
Hello

 (And as a side note, I'm not really familiar with Postgres' internals,
 so it took me roughly six hours to arrive to a working patch from the
 very start. I'd therefore welcome pointers about the best way to achieve
 that with Postgres' string stream interface.)



http://www.pgsql.cz/index.php/C_a_PostgreSQL_-_intern%C3%AD_mechanismy

Regards

Pavel

 With kind regards,
 Jan

 --
 Trojita, a fast e-mail client -- http://trojita.flaska.net/



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-11-13 Thread Matteo Beccati

Hi Andrew,

On 13/11/2011 02:56, Andrew Dunstan wrote:

Here is a patch for that for pg_dump. The sections provided for are
pre-data, data and post-data, as discussed elsewhere. I still feel that
anything finer grained should be handled via pg_restore's --use-list
functionality. I'll provide a patch to do the same switch for pg_restore
shortly.

Adding to the commitfest.


FWIW, I've tested the patch as I've recently needed to build a custom 
splitting script for a project and the patch seemed to be a much more 
elegant solution. As far as I can tell, it works great and the output 
matches the result of my script.


The only little thing I've noticed is a missing ending ) in the --help 
message.



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW system columns

2011-11-13 Thread Peter Eisentraut
On sön, 2011-11-13 at 00:58 +, Thom Brown wrote:
 Is there a cost to having them there?  Could there be tools that might
 break if the columns were no longer available?

Doubtful.  Views don't have system columns either.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Detach/attach database

2011-11-13 Thread Thom Brown
Hi,

I don't know if this has been discussed before, but would it be
feasible to introduce the ability to detach and attach databases? (if
you're thinking stop right there skip to the end)  What I had in
mind would be to do something like the following:

SELECT pg_detach_database('my_database')

 pg_detach_database

 base/16384
(1 row)

Then this database would no longer be accessible.  The system tables
would be updated to reflect the removal of this database, and some
kind of manifest file would be produced in that database directory.
The database represented by that returned directory could then be
moved elsewhere (or deleted if the user so wished, but isn't a real
benefit).

But then if another cluster were running with the same version and
architecture of the cluster we just detached our database from, we
could copy that directory to the base directory of that other cluster
(assuming the OID of the database didn't already conflict with an
existing one), then run:

SELECT pg_attach_database('16384', optional_tablespace_name_here);

Then the usual version/architecture/platform checks would happen, and
a reading of the manifest to populate system tables, checking for
conflicts, then the database brought online.

And if I haven't already asked too much, something like this could be run:

SELECT pg_start_clone('my_database');

 pg_start_clone

 base/16384
(1 row)

You may now be able to infer where this notion came from, when someone
asked if you can clone databases without kicking users off.  However,
this isn't a schema-only copy, but naturally contains data as well.

So that function above would be like pg_start_backup() except that we
wouldn't prevent writes to other stable databases, just the candidate
one.  I personally don't see how this could work with the way we
currently replay WAL files.

But the idea is that it would create the manifest file like
attach_database() except it wouldn't detach the database, and users
could continue to use the database.

Then the user could copy away the database directory elsewhere, then call:

SELECT pg_stop_clone('my_database');

And in theory the user could rename the copy of the directory and move
it back and reattach it as an identical copy (or identical at the time
of copying it).

The most obvious problems I see are related to permissions.  All the
object level permissions would have to exist in the destination
database (which is fine if you're copying it to the same cluster), but
even if they do exist, I suspect the OIDs of those users would need to
be the same.  Then there's extensions and collations.  If either of
those are unavailable on the destination cluster, it couldn't work
either.  But then some kind of error message communicating the missing
component could be returned when attempting to attach it.

Also I don't know how you could pause WAL replay for an individual database.

But use cases for this would be:
- Backing up select databases for quick re-attachment without lengthy restores
- User running out of disk space and wants to move large databases to
another server quickly
- A large static database is wanted on another server
- A copy of a large database is wanted on the same cluster

It's just a vague idea, and I'm kind of expecting responses to begin
with Well for a start, this couldn't possible begin to work
because... but that's par for the course. ;)

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQLDA fix for ECPG

2011-11-13 Thread Michael Meskes

This must have been a cut and paste bug and is incorrect
in 9.0.x, 9.1.x and GIT HEAD. It would be nice to have it
applied before the next point releases come out.


Applied, thanks for the patch.

Michael

--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detach/attach database

2011-11-13 Thread Simon Riggs
On Sun, Nov 13, 2011 at 1:13 PM, Thom Brown t...@linux.com wrote:

 I don't know if this has been discussed before, but would it be
 feasible to introduce the ability to detach and attach databases? (if
 you're thinking stop right there skip to the end)  What I had in
 mind would be to do something like the following:

That would be better done at the tablespace level, and then the
feature becomes transportable tablespaces. Which seems like a good
and useful idea to me.

 You may now be able to infer where this notion came from, when someone
 asked if you can clone databases without kicking users off.  However,
 this isn't a schema-only copy, but naturally contains data as well.

The OP wanted to do this without freezing activity on the database,
which is not easy...

OTOH we can do a backup of just a single database and then filter
recovery at database level to produce just a single copy of another
database on its own server, if anyone wanted that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 If we could be certain that a query was being executed immediately

... that is, with the same snapshot ...

 then it would be possible to simplify expressions using stable
 functions as if they were constants. My earlier patch did exactly
 that.

Mph.  I had forgotten about that aspect of it.  I think that it's
very largely superseded by Marti Raudsepp's pending patch:
https://commitfest.postgresql.org/action/patch_view?id=649
which does more and doesn't require any assumption that plan and
execution snapshots are the same.

Now you're going to say that that doesn't help for failure to prove
partial index or constraint conditions involving stable functions,
and my answer is going to be that that isn't an interesting use-case.
Partial index conditions *must* be immutable, and constraint conditions
*should* be.  As far as partitioning goes, the correct solution there
is to move the partition selection to run-time, so we should not be
contorting query semantics to make incremental performance improvements
with the existing partitioning infrastructure.

I remain of the opinion that Robert's proposal is a bad idea.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQLDA fix for ECPG

2011-11-13 Thread Tom Lane
Boszormenyi Zoltan z...@cybertec.at writes:
 I had a report about ECPG code crashing which involved
 a query using a date field. Attached is a one liner fix to make
 the date type's offset computed consistently across
 sqlda_common_total_size(), sqlda_compat_total_size() and
 sqlda_native_total_size().

Is this really the only issue there?  I notice discrepancies among those
three routines for some other types too, notably ECPGt_timestamp and
ECPGt_interval.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Simon Riggs
On Sun, Nov 13, 2011 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 As far as partitioning goes, the correct solution there
 is to move the partition selection to run-time, so we should not be
 contorting query semantics to make incremental performance improvements
 with the existing partitioning infrastructure.

Agreed, but I think we need both planning and execution time
awareness, just as we do with indexonly.

That's what I'd like to be able to do: link planning and execution.

It's all very well to refuse individual cases where linkage is
required, but ISTM clear that there are many possible uses of being
able to tell whether a plan is one-shot or not and nothing lost by
allowing that information (a boolean) pass to the executor.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detach/attach database

2011-11-13 Thread Tom Lane
Thom Brown t...@linux.com writes:
 I don't know if this has been discussed before, but would it be
 feasible to introduce the ability to detach and attach databases? (if
 you're thinking stop right there skip to the end)

... skipping ...

 It's just a vague idea, and I'm kind of expecting responses to begin
 with Well for a start, this couldn't possible begin to work
 because... but that's par for the course. ;)

The main reason this doesn't work is XID management.

It's barely possible you could make it work if you first locked all
other sessions out of the DB and then froze every XID in the database,
but that's a sufficiently heavyweight operation to make it of dubious
value.

You'd also have to think of some way to ensure that page LSNs in the
database are lower than the current WAL endpoint in the receiver.

The other thing I'd be concerned about is inconsistency with the global
system catalogs in the receiving installation.  Consider roles for
example: the receiver might not have the same set of roles, probably
wouldn't have the same OIDs for those roles, and definitely would be
missing the pg_shdepend entries that describe which objects in the
transported database are owned by which roles.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 It's all very well to refuse individual cases where linkage is
 required, but ISTM clear that there are many possible uses of being
 able to tell whether a plan is one-shot or not and nothing lost by
 allowing that information (a boolean) pass to the executor.

It's an interconnection between major modules that IMO we don't need.
Having the executor behave differently depending on the planning path
the query took creates complexity, which creates bugs.  You haven't
produced any use-case at all that convinces me that it's worth the risk;
nor do I believe there are lots more use-cases right around the corner.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detach/attach database

2011-11-13 Thread Andres Freund
On Sunday, November 13, 2011 13:13:11 Thom Brown wrote:
 Hi,
 
 I don't know if this has been discussed before, but would it be
 feasible to introduce the ability to detach and attach databases? (if
 you're thinking stop right there skip to the end)  
 It's just a vague idea, and I'm kind of expecting responses to begin
 with Well for a start, this couldn't possible begin to work
 because... but that's par for the course. ;)
You would have to do quite some surgery because of oids from shared tables. I 
don't think thats easily dooable.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detach/attach database

2011-11-13 Thread Thom Brown
On 13 November 2011 16:42, Tom Lane t...@sss.pgh.pa.us wrote:
 Thom Brown t...@linux.com writes:
 I don't know if this has been discussed before, but would it be
 feasible to introduce the ability to detach and attach databases? (if
 you're thinking stop right there skip to the end)

 ... skipping ...

 It's just a vague idea, and I'm kind of expecting responses to begin
 with Well for a start, this couldn't possible begin to work
 because... but that's par for the course. ;)

 The main reason this doesn't work is XID management.

 It's barely possible you could make it work if you first locked all
 other sessions out of the DB and then froze every XID in the database,
 but that's a sufficiently heavyweight operation to make it of dubious
 value.

 You'd also have to think of some way to ensure that page LSNs in the
 database are lower than the current WAL endpoint in the receiver.

 The other thing I'd be concerned about is inconsistency with the global
 system catalogs in the receiving installation.  Consider roles for
 example: the receiver might not have the same set of roles, probably
 wouldn't have the same OIDs for those roles, and definitely would be
 missing the pg_shdepend entries that describe which objects in the
 transported database are owned by which roles.

I feared such a non-traversable terrain would prevent it being
possible.  Oh well.  Thanks for the explanation though.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Robert Haas
On Sun, Nov 13, 2011 at 11:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 If we could be certain that a query was being executed immediately

 ... that is, with the same snapshot ...

 then it would be possible to simplify expressions using stable
 functions as if they were constants. My earlier patch did exactly
 that.

 Mph.  I had forgotten about that aspect of it.  I think that it's
 very largely superseded by Marti Raudsepp's pending patch:
 https://commitfest.postgresql.org/action/patch_view?id=649
 which does more and doesn't require any assumption that plan and
 execution snapshots are the same.

 Now you're going to say that that doesn't help for failure to prove
 partial index or constraint conditions involving stable functions,
 and my answer is going to be that that isn't an interesting use-case.
 Partial index conditions *must* be immutable, and constraint conditions
 *should* be.  As far as partitioning goes, the correct solution there
 is to move the partition selection to run-time, so we should not be
 contorting query semantics to make incremental performance improvements
 with the existing partitioning infrastructure.

 I remain of the opinion that Robert's proposal is a bad idea.

Wait a minute.  I can understand why you think it's a bad idea to
preserve a snapshot across multiple protocol messages
(parse/bind/execute), but why or how would it be a bad idea to keep
the same snapshot between planning and execution when the whole thing
is being done as a unit?  You haven't offered any real justification
for that position, and it seems to me that if anything the semantics
of such a thing are far *less* intuitive than it would be to do the
whole thing under a single snapshot.  The whole point of snapshot
isolation is that our view of the database doesn't change mid-query;
and yet you are now saying that's exactly the behavior we should have.
 That seems exactly backwards to me.

I also think you are dismissing Simon's stable-expression-folding
proposal far too lightly.  I am not sure that the behavior he wants is
safe given the current details of our implementation - or even with my
patch; I suspect a little more than that is needed - but I am pretty
certain it's the behavior that users want and expect, and we should be
moving toward it, not away from it.  I have seen a significant number
of cases over the years where the query optimizer generated a bad plan
because it did less constant-folding than the user expected.  Users do
not walk around thinking about the fact that the planner and executor
are separate modules and therefore probably should use separate
snapshots.  They expect their query to see a consistent view of the
database.  Period.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Nov 13, 2011 at 11:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I remain of the opinion that Robert's proposal is a bad idea.

 Wait a minute.  I can understand why you think it's a bad idea to
 preserve a snapshot across multiple protocol messages
 (parse/bind/execute), but why or how would it be a bad idea to keep
 the same snapshot between planning and execution when the whole thing
 is being done as a unit?  You haven't offered any real justification
 for that position,

It's not hard to come by: execution should proceed with the latest
available view of the database.

 and it seems to me that if anything the semantics
 of such a thing are far *less* intuitive than it would be to do the
 whole thing under a single snapshot.

In that case you must be of the opinion that extended query protocol
is a bad idea and we should get rid of it, and the same for prepared
plans of all types.  What you're basically proposing is that simple
query mode will act differently from other ways of submitting a query,
and I don't think that's a good idea.  It might be sane if planning
could be assumed to take zero time, but that's hardly true.

 I also think you are dismissing Simon's stable-expression-folding
 proposal far too lightly.  I am not sure that the behavior he wants is
 safe given the current details of our implementation - or even with my
 patch; I suspect a little more than that is needed - but I am pretty
 certain it's the behavior that users want and expect, and we should be
 moving toward it, not away from it.  I have seen a significant number
 of cases over the years where the query optimizer generated a bad plan
 because it did less constant-folding than the user expected.

This is just FUD, unless you can point to specific examples where
Marti's patch won't fix it.  If that patch crashes and burns for
some reason, then we should revisit this idea; but if it succeeds
it will cover more cases than plan-time constant folding could.

One of the reasons I don't want to go this direction is that it would
re-introduce causes of extended query protocol having poor performance
relative to simple protocol.  That's not something that users find
intuitive or desirable, either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BuildFarm - Jaguar Check Failure

2011-11-13 Thread Tom Lane
Mehdi Maache ml...@ide-environnement.com writes:
 test case : rangetypes   ... FAILED

Hmm ... jaguar is the CLOBBER_CACHE_ALWAYS machine, isn't it.
I bet this reflects a cache flush bug in the new range-types code.
That would explain the fact that some other machines show the same
regression diff intermittently, too, such as
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pikadt=2011-11-12%2022%3A46%3A53

Trying to reproduce it here...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Kevin Grittner
 Tom Lane  wrote:
 Robert Haas  writes:
 
 I can understand why you think it's a bad idea to preserve a
 snapshot across multiple protocol messages (parse/bind/execute),
 but why or how would it be a bad idea to keep the same snapshot
 between planning and execution when the whole thing is being done
 as a unit? You haven't offered any real justification for that
 position,

 It's not hard to come by: execution should proceed with the latest
 available view of the database.
 
I don't think that stands as an intuitively obvious assertion.  I
think we need to see the argument which leads to that conclusion.
 
 and it seems to me that if anything the semantics of such a thing
 are far *less* intuitive than it would be to do the whole thing
 under a single snapshot.

 In that case you must be of the opinion that extended query
 protocol is a bad idea and we should get rid of it, and the same
 for prepared plans of all types. What you're basically proposing is
 that simple query mode will act differently from other ways of
 submitting a query, and I don't think that's a good idea.
 
In what way would that difference be user-visible?
 
 One of the reasons I don't want to go this direction is that it
 would re-introduce causes of extended query protocol having poor
 performance relative to simple protocol. That's not something that
 users find intuitive or desirable, either.
 
If the simple protocol can perform better than the extended protocol,
it hardly seems like a good idea to intentionally cripple the fast
one to keep them at the same performance.  It seems like it would be
better to document the performance difference so that people can
weigh the trade-offs.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Poor use of caching in rangetypes code

2011-11-13 Thread Tom Lane
While digging around for the cause of the buildfarm failures in the new
rangetypes code, I noticed that range_gettypinfo thinks it can memcpy
the result of fmgr_info().  This is not cool.  It's true that fn_extra
is likely to be NULL at the instant the copy occurs, but what will
happen if the called function is using fn_extra is that on *each call*
it will have to re-create whatever data it's caching there.  That's not
only bad for performance but it means a per-call memory leak.

It looks like the reason that it's done this way is that some functions
call range_gettypinfo twice, presumably for types that could be
different ... but if they actually were different, then again the
caching would be counterproductive.

I think we need to fix this so that a cached RangeTypeInfo is used
in-place.  The places that need two of them will need some extra code
to set up a two-RangeTypeInfo cache area.

Unless you see a hole in this analysis, I will go make the changes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cause of intermittent rangetypes regression test failures

2011-11-13 Thread Tom Lane
Well, I was overthinking the question of why rangetypes sometimes fails
with

  select count(*) from test_range_gist where ir  int4range(100,500);
! ERROR:  input range is empty

Turns out that happens whenever auto-analyze has managed to process
test_range_gist before we get to this part of the test.  jaguar
is more likely to see this because CLOBBER_CACHE_ALWAYS slows down the
rangetypes code to a really staggering extent, but obviously it can
happen anywhere.  If the table has been analyzed, then the
most_common_values array for column ir will consist of 
{empty}
which is entirely correct since that value accounts for 16% of the
table.  And then, when mcv_selectivity tries to estimate the selectivity
of the  condition, it applies range_before to the empty range along
with the int4range(100,500) value, and range_before spits up.

I think this demonstrates that the current definition of range_before is
broken.  It is not reasonable for it to throw an error on a perfectly
valid input ... at least, not unless you'd like to mark it VOLATILE so
that the planner will not risk calling it.

What shall we have it do instead?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Regression tests fail once XID counter exceeds 2 billion

2011-11-13 Thread Tom Lane
While investigating bug #6291 I was somewhat surprised to discover
$SUBJECT.  The cause turns out to be this kluge in alter_table.sql:

select virtualtransaction
from pg_locks
where transactionid = txid_current()::integer

which of course starts to fail with integer out of range as soon as
txid_current() gets past 2^31.  Right now, since there is no cast
between xid and any integer type, and no comparison operator except the
dubious xideqint4 one, the only way we could fix this is something
like

where transactionid::text = (txid_current() % (2^32))::text

which is surely pretty ugly.  Is it worth doing something less ugly?
I'm not sure if there are any other use-cases for this type of
comparison, but if there are, seems like it would be sensible to invent
a function along the lines of

txid_from_xid(xid) returns bigint

that plasters on the appropriate epoch value for an
assumed-to-be-current-or-recent xid, and returns something that squares
with the txid_snapshot functions.  Then the test could be coded without
kluges as

where txid_from_xid(transactionid) = txid_current()

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Robert Haas
On Sun, Nov 13, 2011 at 12:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Wait a minute.  I can understand why you think it's a bad idea to
 preserve a snapshot across multiple protocol messages
 (parse/bind/execute), but why or how would it be a bad idea to keep
 the same snapshot between planning and execution when the whole thing
 is being done as a unit?  You haven't offered any real justification
 for that position,

 It's not hard to come by: execution should proceed with the latest
 available view of the database.

The word latest doesn't seem very illuminating to me. If you take
that to its (illogical) conclusion, that would mean that we ought to
do everything under SnapshotNow - i.e. every time we fetch a tuple,
use the latest available view of the database.  It seems to me that
you can wrap some logic around this - we shouldn't use a snapshot
taken later than event1 because reason1, and we shouldn't use one
taken earlier than event2 because reason2.

It seems to me that the *latest* snapshot we could use would be one
taken the instant before we did any calculation whose result might
depend on our choice of snapshot.  For example, if the query involves
calculating pi out to 5000 decimal places (without looking at any
tables) and then scanning for the matching value in some table column,
we could do the whole calculation prior to taking a snapshot and then
take the snapshot only when we start groveling through the table.
That view would be later than the one we use now, and but still
correct.

On the other hand, it seems to me that the *earliest* snapshot we can
use is one taken the instant after we receive the protocol message
that tells us to execute the query.  If we take it any sooner than
that, we might fail to see as committed some transaction which was
acknowledged before the user sent the message.

Between those two extremes, it seems to me that when exactly the
snapshot gets taken is an implementation detail.

 and it seems to me that if anything the semantics
 of such a thing are far *less* intuitive than it would be to do the
 whole thing under a single snapshot.

 In that case you must be of the opinion that extended query protocol
 is a bad idea and we should get rid of it, and the same for prepared
 plans of all types.  What you're basically proposing is that simple
 query mode will act differently from other ways of submitting a query,
 and I don't think that's a good idea.

I don't see why anything I said would indicate that we shouldn't have
prepared plans.  It is useful for users to have the option to parse
and plan before execution - especially if they want to execute the
same query repeatedly - and if they choose to make use of that
functionality, then we and they will have to deal with the fact that
things can change between plan time and execution time.  If that means
we miss some optimization opportunities, so be it.  But we needn't
deliver the semantics associated with the extended query protocol when
the user isn't using it; and the next time we bump the protocol
version we probably should give some thought to making sure that you
only need to use the extended query protocol when you explicitly want
to separate parse/plan from execution, and not just to get at some
other functionality that we've only chosen to provided using the
extended protocol.

 It might be sane if planning
 could be assumed to take zero time, but that's hardly true.

I still maintain that the length of planning is irrelevant; more, if
the planning and execution are happening in response to a single
protocol message, then the semantics of the query need not (and
perhaps even should not) depend on how much of that time is spent
planning and how much is spent executing.

 I also think you are dismissing Simon's stable-expression-folding
 proposal far too lightly.  I am not sure that the behavior he wants is
 safe given the current details of our implementation - or even with my
 patch; I suspect a little more than that is needed - but I am pretty
 certain it's the behavior that users want and expect, and we should be
 moving toward it, not away from it.  I have seen a significant number
 of cases over the years where the query optimizer generated a bad plan
 because it did less constant-folding than the user expected.

 This is just FUD, unless you can point to specific examples where
 Marti's patch won't fix it.  If that patch crashes and burns for
 some reason, then we should revisit this idea; but if it succeeds
 it will cover more cases than plan-time constant folding could.

I haven't reviewed the two patches in enough detail to have a clear
understanding of which use cases each one does and does not cover.
But, for example, you wrote this:

tgl As far as partitioning goes, the correct solution there
tgl is to move the partition selection to run-time, so we should not be
tgl contorting query semantics to make incremental performance improvements
tgl with the existing partitioning infrastructure.

...and 

Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Florian Pflug
On Nov14, 2011, at 00:13 , Robert Haas wrote:
 On Sun, Nov 13, 2011 at 12:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 In that case you must be of the opinion that extended query protocol
 is a bad idea and we should get rid of it, and the same for prepared
 plans of all types.  What you're basically proposing is that simple
 query mode will act differently from other ways of submitting a query,
 and I don't think that's a good idea.
 
 I don't see why anything I said would indicate that we shouldn't have
 prepared plans.  It is useful for users to have the option to parse
 and plan before execution - especially if they want to execute the
 same query repeatedly - and if they choose to make use of that
 functionality, then we and they will have to deal with the fact that
 things can change between plan time and execution time.

The problem, or at least what I perceived to be the problem, is that
protocol-level support for prepared plans isn't the only reason to use
the extended query protocol. The other reasons are protocol-level control
over text vs. binary format, and out-of-line parameters.

In my experience, it's hard enough as it is to convince developers to
use statement parameters instead of interpolating them into the SQL
string. Once word gets out that the simple protocol is now has less locking
overhead than the extended protocol, it's going to get even harder...

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW system columns

2011-11-13 Thread Florian Pflug
On Nov13, 2011, at 01:38 , Tom Lane wrote:
 Just a couple hours ago I was wondering why we create system columns for
 foreign tables at all.  Is there a reasonable prospect that they'll ever
 be useful?  I can see potential value in tableoid, but the others seem
 pretty dubious --- even if you were fetching from a remote PG server,
 the XIDs would not be meaningful within our own environment.

At least ctid seems useful too. I've used that in the past as a poor man's
surrogate primary key.

Also, people have used ctid and xmin in the past to re-find previously
visited rows and to check whether they've been modified. So there might be
some value in keeping xmin around also (and make the postgres fdw populate it)

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Robert Haas
On Sun, Nov 13, 2011 at 6:45 PM, Florian Pflug f...@phlo.org wrote:
 On Nov14, 2011, at 00:13 , Robert Haas wrote:
 On Sun, Nov 13, 2011 at 12:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 In that case you must be of the opinion that extended query protocol
 is a bad idea and we should get rid of it, and the same for prepared
 plans of all types.  What you're basically proposing is that simple
 query mode will act differently from other ways of submitting a query,
 and I don't think that's a good idea.

 I don't see why anything I said would indicate that we shouldn't have
 prepared plans.  It is useful for users to have the option to parse
 and plan before execution - especially if they want to execute the
 same query repeatedly - and if they choose to make use of that
 functionality, then we and they will have to deal with the fact that
 things can change between plan time and execution time.

 The problem, or at least what I perceived to be the problem, is that
 protocol-level support for prepared plans isn't the only reason to use
 the extended query protocol. The other reasons are protocol-level control
 over text vs. binary format, and out-of-line parameters.

 In my experience, it's hard enough as it is to convince developers to
 use statement parameters instead of interpolating them into the SQL
 string. Once word gets out that the simple protocol is now has less locking
 overhead than the extended protocol, it's going to get even harder...

Well, if our goal in life is to allow people to have protocol control
over text vs. binary format and support out-of-line parameters without
requiring multiple protocol messages, we can build that facility in to
the next version of the protocol.  I know Kevin's been thinking about
working on that project for a number of reasons, and this would be a
good thing to get on the list.

On the other hand, if our goal in life is to promote the extended
query protocol over the simple query protocol at all costs, then I
agree that we shouldn't optimize the simple query protocol in any way.
 Perhaps we should even post a big notice on it that says this
facility is deprecated and will be removed in a future version of
PostgreSQL.  But why should that be our goal?  Presumably our goal is
to put forward the best technology, not to artificially pump up one
alternative at the expense of some other one.  If the simple protocol
is faster in certain use cases than the extended protocol, then let
people use it.  I wouldn't have noticed this optimization opportunity
in the first place but for the fact that psql seems to use the simple
protocol - why does it do that, if the extended protocol is
universally better?  I suspect that, as with many other things where
we support multiple alternatives, the best alternative depends on the
situation, and we should let users pick depending on their use case.

At any rate, if you're concerned about the relative efficiency of the
simple query protocol versus the extended protocol, it seems that the
horse has already left the barn.  I just did a quick 32-client pgbench
-S test on a 32-core box.  This is just a thirty-second run, but
that's enough to make the point: if you're not using prepared queries,
using the extended query protocol incurs a significant penalty - more
than 15% on this test:

[simple] tps = 246808.409932 (including connections establishing)
[extended] tps = 205609.438247 (including connections establishing)
[prepared] tps = 338150.881389 (including connections establishing)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql history vs. dearmor (pgcrypto)

2011-11-13 Thread Tomas Vondra
Hi,

I've noticed that psql query buffer somehow interferes with dearmor
(from pgcrypto), corrupting the data. For example this works fine:

SELECT dearmor('-BEGIN PGP PUBLIC KEY BLOCK-
Version: GnuPG v2.0.17 (GNU/Linux)

mQGiBE7AfUoRBACpupjE5tG9Fh1dWe2kb/yX+lNlMLpwMj1hjTrJo1cYmSYixkGX
Si90ZIjn0IOSU7XOkFai8btpbFGyGSdaB9BQK7s8ItN/wx9IHcnB83Lbex3aF/VS
hN81VummzKQ0YB+Crwp1mu1l76UrTg6sPnY+wHj3jPOleXcX9L9UAAzOnwCgi4OS
JoRzR/pPiWtW0Nk5qnYhuZMD/RyNYbKkoNVO4WUnfOFMqm2zIqRXmMnkXS6gNPsd
RNVXb4ByFSzugsZKW5ez9+zS0G0aarySQIuGgPGKSeZezYtwKR3DH676MmdnNSvx
GiGDQW+hSXBOiBOmxhZfBK8H6JfmEtUpZwA8tkzD0u6ikZjQZR0cRux/tdutzTuZ
YGyaA/4tWzKtQP+WDi5tUPNO1/7EcBphYMvZDfNzYUn5ZwXzw5B5YSi0rdY6ZLSP
H3X8hrHbSmDrD8KseLtl9E4YvaOWd0BZCg9QwUcVrR+9sYtyNy/ztX++vVOtFjQ6
b19rj0853fwSgv9gHoNelmBXs0jTDGaKSBwzTD8GYtusQcu3lbQZYWFhYWEgPGFh
YWFhQGV4YW1wbGUuY29tPohiBBMRAgAiBQJOwH1KAhsDBgsJCAcDAgYVCAIJCgsE
FgIDAQIeAQIXgAAKCRCNcpg0BUjyDOKPAJ4viutaojyBhV0ICJED09ArUXgZ7ACf
U6CX156L6i6x8UzRLFxsvVKHXIK5AQ0ETsB9ShAEAMDqwXmBeJGqWgXrtVKh6XIw
uanQtl/lIhktVcAYa/FHnvleL9RqI6JpiVWuvLfOdDcUQmh3MvsmD6h6plVmg/bz
/y1ZGnWANjCazmSWDjTfuIX+wuWo4TKSRhXzUd5tw5bgaeC0Hvy+rlgswRILFYL1
5I0/NTm+fFkB0McY9E2fAAMHBACgpmaAW/VR4IGn+j74GCzn2W06UnnWSK7A0GPJ
kUiJa37mv04yCeIqmoTVkl5rnz8dZZUwJVKYwlRvvLB/omIdzRkouhK/QWioRQ+M
B5qPXjRNrcUnruWVzC3XfhZ6sImI8bh2tHpN1/r0hHXFb/5078Bv2d4Cq2WdMZJo
oGDxBIhJBBgRAgAJBQJOwH1KAhsMAAoJEI1ymDQFSPIM7RcAn22lbnWNWiGby9SU
mEQSkrE34O8+AKCFFPLQiCs3/EL3+2DsplWOnEcSuQ==
=Q6Oq
-END PGP PUBLIC KEY BLOCK-');


but recalling it from the query buffer results in

  ERROR:  Corrupt ascii-armor

I've noticed this on 9.1 but 9.2devel behaves exactly the same. I'm
using 64-bit Linux with UTF8, nothing special.


Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Robert Haas
On Sun, Nov 13, 2011 at 7:37 PM, Robert Haas robertmh...@gmail.com wrote:
 In my experience, it's hard enough as it is to convince developers to
 use statement parameters instead of interpolating them into the SQL
 string. Once word gets out that the simple protocol is now has less locking
 overhead than the extended protocol, it's going to get even harder...

 [ discussion of convincing people to use

 At any rate, if you're concerned about the relative efficiency of the
 simple query protocol versus the extended protocol, it seems that the
 horse has already left the barn.

On further examination, it seems that the behavior of the current code
is as follows:

pgbench -n -S -t 2000 == ~4000 snapshots
pgbench -n -S -t 2000 -M extended == ~6000 snapshots
pgbench -n -S -t 2000 -M prepared == ~4000 snapshots

So it's already the case that simple protocol has less locking
overhead than the extended protocol, unless you're using prepared
queries.  The -M prepared case appears to be doing just about exactly
the same thing that happens in the simple case: we take a snapshot in
exec_bind_message() and then release it a nanosecond before calling
PortalStart(), which promptly takes a new one.  IOW, it looks like the
same optimization that applies to the simple case can be applied here
as well.

In the -M extended case, we take a snapshot from exec_parse_message(),
and the same two in the exec_bind_message() call that are taken in the
-M prepared case.  So reducing the prepared case from two snapshots to
one will reduce the extended case from three snapshots to two, thus
saving one snapshot per query regardless of how it's executed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW system columns

2011-11-13 Thread Robert Haas
On Sun, Nov 13, 2011 at 6:57 PM, Florian Pflug f...@phlo.org wrote:
 On Nov13, 2011, at 01:38 , Tom Lane wrote:
 Just a couple hours ago I was wondering why we create system columns for
 foreign tables at all.  Is there a reasonable prospect that they'll ever
 be useful?  I can see potential value in tableoid, but the others seem
 pretty dubious --- even if you were fetching from a remote PG server,
 the XIDs would not be meaningful within our own environment.

 At least ctid seems useful too. I've used that in the past as a poor man's
 surrogate primary key.

 Also, people have used ctid and xmin in the past to re-find previously
 visited rows and to check whether they've been modified. So there might be
 some value in keeping xmin around also (and make the postgres fdw populate it)

My vote is to nuke 'em all.  :-)

I don't think that we want to encourage people to depend on the
existence of system columns any more than they do already.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-13 Thread Robert Haas
On Sun, Nov 13, 2011 at 8:57 PM, Robert Haas robertmh...@gmail.com wrote:
 In the -M extended case, we take a snapshot from exec_parse_message(),
 and the same two in the exec_bind_message() call that are taken in the
 -M prepared case.  So reducing the prepared case from two snapshots to
 one will reduce the extended case from three snapshots to two, thus
 saving one snapshot per query regardless of how it's executed.

And here are the revised patches.  Apply refactor-portal-start
(unchanged) first and then just-one-snapshot-v2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


just-one-snapshot-v2.patch
Description: Binary data


refactor-portal-start.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cause of intermittent rangetypes regression test failures

2011-11-13 Thread Jeff Davis
On Sun, 2011-11-13 at 15:38 -0500, Tom Lane wrote:
 If the table has been analyzed, then the
 most_common_values array for column ir will consist of 
   {empty}
 which is entirely correct since that value accounts for 16% of the
 table.  And then, when mcv_selectivity tries to estimate the selectivity
 of the  condition, it applies range_before to the empty range along
 with the int4range(100,500) value, and range_before spits up.
 
 I think this demonstrates that the current definition of range_before is
 broken.  It is not reasonable for it to throw an error on a perfectly
 valid input ... at least, not unless you'd like to mark it VOLATILE so
 that the planner will not risk calling it.
 
 What shall we have it do instead?

We could have it return NULL, I suppose. I was worried that that would
lead to confusion between NULL and the empty range, but it might be
better than marking it VOLATILE.

Thoughts, other ideas?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers