date:20120329

On Mon, Mar 26, 2012 at 12:14 AM, Jeff Davis pg...@j-davis.com wrote:
 On Tue, 2012-03-20 at 01:38 -0700, Daniel Farina wrote:
 On Thu, Mar 15, 2012 at 9:39 PM, Fujii Masao masao.fu...@gmail.com wrote:
  On Fri, Mar 16, 2012 at 8:14 AM, Daniel Farina dan...@heroku.com wrote:
  Parallel to pg_cancel_backend, it'd be nice to allow the user to just
  outright kill a backend that they own (politely, with a SIGTERM),
  aborting any transactions in progress, including the idle transaction,
  and closing the socket.
 
  +1

 Here's a patch implementing the simple version, with no more guards
 against signal racing than have been seen previously.  The more
 elaborate variants to close those races is being discussed in a
 parallel thread, but I thought I'd get this simple version out there.

 Review:

 After reading through the threads, it looks like there was no real
 objection to this approach -- pid recycling is not something we're
 concerned about.

 I think you're missing a doc update though, in func.sgml:

 For the less restrictive functionpg_cancel_backend/, the role of an
 active backend can be found from
 the structfieldusename/structfield column of the
 structnamepg_stat_activity/structname view.

 Also, high-availability.sgml makes reference to pg_cancel_backend(), and
 it might be worthwhile to add an ...and pg_terminate_backend() there.

 Other than that, it looks good to me.

Good comments. Patch attached to address the doc issues.  The only
iffy thing is that the paragraph For the less restrictive... I have
opted to remove in its entirely.  I think the documents are already
pretty clear about the same-user rule, and I'm not sure if this is the
right place for a crash-course on attributes in pg_stat_activity (but
maybe it is).

...and pg_terminate_backend seems exactly right.

And I think now that the system post-patch doesn't have such a strange
contrast between the ability to send SIGINT vs. SIGTERM, such a
contrast may not be necessary.

I'm also not sure what the policy is about filling paragraphs in the
manual.  I filled one, which increases the sgml churn a bit. git
(show|diff) --word-diff helps clean it up.

-- 
fdr


Extend-same-role-backend-management-to-pg_terminate_backend-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

Hitoshi Harada umi.tan...@gmail.com writes:
 Frankly I'm still against this patch.  Since I started to review it
 I've never been convinced with the use case.  Yeah, someone said it'd
 be useful to him, but as a developer of some of PGXN modules I don't
 see it.  I totally agree with Robert's point that one feature is not
 standardized and nobody can tell how you can depend on the feature in
 the end.  Mind you, I've never heard about building dependency by its

Ok, we might need to find another word for the concept here. Will think,
would appreciate native speakers' thought.

 name as a string in other packaging system.  If you want to introduce
 the concept of version dependency not feature name dependency, do
 *it*;  I don't think feature dependency solves it.

I don't want to introduce version dependency, because I don't think we
need it. If you want to compare what we're doing here with say debian
packaging, then look at how they package libraries. The major version
number is now part of the package name and you depend on that directly.

So let's take the shortcut to directly depend on the “feature” name.

For a PostgreSQL extension example, we could pick ip4r. That will soon
include support for ipv6 (it's already done code wise, missing docs
update). If you want to use ip4r for storing ipv6, you will simply
require “ip6r” or whatever feature name is provided by the extension
including it.

If you really think this can not be made to work in your use cases,
please provide us with an example where it fails.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

2012-03-29 Thread Hitoshi Harada

On Thu, Mar 29, 2012 at 12:51 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Hitoshi Harada umi.tan...@gmail.com writes:
 Frankly I'm still against this patch.  Since I started to review it
 I've never been convinced with the use case.  Yeah, someone said it'd
 be useful to him, but as a developer of some of PGXN modules I don't
 see it.  I totally agree with Robert's point that one feature is not
 standardized and nobody can tell how you can depend on the feature in
 the end.  Mind you, I've never heard about building dependency by its

 Ok, we might need to find another word for the concept here. Will think,
 would appreciate native speakers' thought.

 name as a string in other packaging system.  If you want to introduce
 the concept of version dependency not feature name dependency, do
 *it*;  I don't think feature dependency solves it.

 I don't want to introduce version dependency, because I don't think we
 need it. If you want to compare what we're doing here with say debian
 packaging, then look at how they package libraries. The major version
 number is now part of the package name and you depend on that directly.

 So let's take the shortcut to directly depend on the “feature” name.

 For a PostgreSQL extension example, we could pick ip4r. That will soon
 include support for ipv6 (it's already done code wise, missing docs
 update). If you want to use ip4r for storing ipv6, you will simply
 require “ip6r” or whatever feature name is provided by the extension
 including it.

So my question is why you cannot depend on ip4r in that case.  If some
version of the module introduces ipv6, then let's depend on that
version.  It doesn't explain why a string feature name is needed.

Thanks,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standbys, txid_current_snapshot, wraparound

On Wed, Mar 28, 2012 at 10:54:58PM +0100, Simon Riggs wrote:
 On Wed, Mar 28, 2012 at 10:24 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Wed, Mar 28, 2012 at 9:48 PM, Marko Kreen mark...@gmail.com wrote:
  On Fri, Mar 23, 2012 at 08:52:40AM +, Simon Riggs wrote:
  Master pg_controldata - OK txid_current_snapshot() - OK
  Standby pg_controldata - OK txid_current_snapshot() - lower value
 
  On Skytools list is report about master with slaves, but the
  lower value appears on master too:
 
   http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html
 
  Cc'd original reporter too.
 
  Thanks. Am looking.
 
 I can't see how this could happen on the master at all.
 
 On the standby, it can happen if we skip restartpoints for about a
 couple of billion xids. Which would be a problem.
 
 More on this tomorrow.

I can't find a place where WAL replay updates values under XLogCtl.
If that really does not happen, that would explain why standbys can
see wrong epoch.

No clue yet how master can get broken.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

Hitoshi Harada umi.tan...@gmail.com writes:
 So my question is why you cannot depend on ip4r in that case.  If some
 version of the module introduces ipv6, then let's depend on that
 version.  It doesn't explain why a string feature name is needed.

The only operator we have to compare version strings in PostgreSQL
extensions is string equality. That's because we don't need more, and
designing a version scheme policy would be far more work that any
benefit I could imagine. And as we didn't enforce version naming policy
in 9.1, it could be argued that it's too late, too.

When you say 'require = ip6r' you are depending on *any* version of the
extension that is providing it, whatever its version string. You don't
have to know that '1.05'  '1.06'  '1.1' and you don't have to know
that the first version with ipv6 support was called '1.06'.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

Hi,

Thanks for your review!

Robert Haas robertmh...@gmail.com writes:
 I think the lack of pg_upgrade support is a must-fix before commit.

I though that would only be a TODO for 9.2 to 9.3 upgrades. When
upgrading from 9.1 to 9.2, pg_upgrade will directly stuff extensions
using InsertExtensionTuple() with an empty features list. That will call
insert_extension_features() which ensures that the extension name is
registered as a feature even when not explicitly listed (backward
compatibility with control files).

So I think we're covered now, but still need to revisit the issue later. 

I edited the comment this way:

List   *features = NIL; /* 9.3 should get features from 
catalogs */

 Don't we need some psql support for listing the features provided by
 an installed extension?  And an available extension?

It's already in the pg_available_extension_versions system's view, which
is not covered by psql (yet). Do we want a new \dx derived command?

 get_extension_feature_oids seems to be misnamed, since it returns only
 one OID, not more than one, and it also returns the feature OID.  At a
 minimum, I think you need to delete the s from the function name;
 possibly it should be renamed to lookup_extension_feature or somesuch.

Done.

 List   *requires;   /* names of prerequisite extensions */
 +   List   *provides;   /* names of provided features */

 Comment in requires line of above hunk should be updated to say
 prerequisite features.

Done.

 -

 Useless hunk.

That must be my editor adding a final line when I visit files… and I
can't seem to be able to prevent this hunk from happening?

 +errmsg(parameter \%s\ must be a
 list of extension names,

 This error, which relates to the parsing of the provides list, say
 extension features, and the existing code for requires needs to be
 updated to say that as well.

Done.

 +errmsg(extension feature \%s\
 already exists [%u],
 +   feature, featoid)));

 That [%u] at the end there does not conform to our message style
 guidelines, and I think the rest of the message could be improved a
 bit as well.  I think maybe it'd be appropriate to work the name of
 the other extension into the message, maybe something like: extension
 %s provides feature %s, but existing extension %s already
 provides this feature

Done.

 +extern char * get_extension_feature_name(Oid featoid);

 Extra space.

Fixed.

 +   if (strcmp(curreq,pcontrol-name) != 0)

 Missing space (after the comma).

Fixed.

 OCLASS_EXTENSION_FEATURE should probable be added to the penultimate
 switch case in ATExecAlterColumnType.

Right, done now.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standbys, txid_current_snapshot, wraparound

On Wed, Mar 28, 2012 at 10:54 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Mar 28, 2012 at 10:24 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Mar 28, 2012 at 9:48 PM, Marko Kreen mark...@gmail.com wrote:
 On Fri, Mar 23, 2012 at 08:52:40AM +, Simon Riggs wrote:
 Master pg_controldata - OK txid_current_snapshot() - OK
 Standby pg_controldata - OK txid_current_snapshot() - lower value

 On Skytools list is report about master with slaves, but the
 lower value appears on master too:

  http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html

 Cc'd original reporter too.

 Thanks. Am looking.

 I can't see how this could happen on the master at all.

 On the standby, it can happen if we skip restartpoints for about a
 couple of billion xids. Which would be a problem.

 More on this tomorrow.

I've not been able to recreate the problem up till now. But knowing
there is a bug helps develop a theory based upon the code.

CreateCheckpoint() increments the epoch on the master at the next
checkpoint after wraparound. (I'd be happier if there was an explicit
link between those two points; there's not but I can't yet see a
problem).

When the standby receives the checkpoint record, it stores the
information in 2 places:
i) directly into ControlFile-checkPointCopy
ii) and then into XLogCtl when a safe restartpoint occurs

It's possible that a safe restartpoint could be delayed. When that
happens, the XLogCtl copy grows stale. If the delay is long enough,
then the NextXid counter will increase and will eventually be higher
than the last XLogCtl, causing the epoch returned by
GetNextXidAndEpoch() to go backwards by 1. If it is delayed even
further it would wrap again and increase again by one, then decrease,
then increase. Given enough time and/or a very busy server using lots
of xids.

At the same time, when we do UpdateControlFile() the other copy of the
epoch is written to the controlfile, so the controlfile shows the
accurate value for the epoch.

So I can explain how we get two different answers from the standby,
and I can explain how the error is very hard to reproduce and
apparently transient. I can't explain anything involving the master.

The key is what delays the restartpoint?. And the answer there is
probably bugs in index code which purport to see invalid pages when
they probably shouldn't be there. So a REINDEX would likely help.

I'll look at a patch to improve this. Definite bug and will be backpatched.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standbys, txid_current_snapshot, wraparound

On Thu, Mar 29, 2012 at 10:37:54AM +0100, Simon Riggs wrote:
 When the standby receives the checkpoint record, it stores the
 information in 2 places:
 i) directly into ControlFile-checkPointCopy
 ii) and then into XLogCtl when a safe restartpoint occurs

In RecoveryRestartPoint() I see:

- memcpy(XLogCtl-lastCheckPoint, checkPoint, sizeof(CheckPoint));

but I still don't see how are the ckptXid* values updated?

What am I missing?

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for parallel pg_dump

On Wed, Mar 28, 2012 at 9:54 PM, Joachim Wieland j...@mcknight.de wrote:
 On Wed, Mar 28, 2012 at 1:46 PM, Robert Haas robertmh...@gmail.com wrote:
 I'm wondering if we really need this much complexity around shutting
 down workers.  I'm not sure I understand why we need both a hard and
 a soft method of shutting them down.  At least on non-Windows
 systems, it seems like it would be entirely sufficient to just send a
 SIGTERM when you want them to die.  They don't even need to catch it;
 they can just die.

 At least on my Linux test system, even if all pg_dump processes are
 gone, the server happily continues sending data. When I strace an
 individual backend process, I see a lot of Broken pipe writes, but
 that doesn't stop it from just writing out the whole table to a closed
 file descriptor. This is a 9.0-latest server.

Wow, yuck.  At least now I understand why you're doing it like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

On Wed, Mar 28, 2012 at 10:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The SELECT INTO tests all fail, but we know the reason why (the testbed
 isn't expecting them to result in creating separate entries for the
 utility statement and the underlying plannable SELECT).

This might be a dumb idea, but for a quick hack, could we just rig
SELECT INTO, CREATE TABLE AS, and EXPLAIN not to create entries for
themselves at all, without suppressing creation of an entry for the
underlying query?  The output might be slightly misleading but it
wouldn't be broken, and I'm disinclined to want to spend a lot of time
doing fine-tuning right now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers patch v18

On Wed, Mar 28, 2012 at 3:41 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 What about :

  create command trigger foo before prepare alter table …
  create command trigger foo before start of alter table …
  create command trigger foo before execute alter table …
  create command trigger foo before end of alter table …

  create command trigger foo when prepare alter table …
  create command trigger foo when start alter table …
  create command trigger foo when execute of alter table …
  create command trigger foo when end of alter table …

  create command trigger foo preceding alter table …
  create command trigger foo preceding alter table … deferred
  create command trigger foo preceding alter table … immediate

  create command trigger foo following parser of alter table …
  create command trigger foo preceding execute of alter table …

  create command trigger foo following alter table …

  create command trigger foo before alter table nowait …

 I'm not sure how many hooks we can really export here, but I guess we
 have enough existing keywords to describe when they get to run (parser,
 mapping, lock, check, begin, analyze, access, disable, not exists, do,
 end, escape, extract, fetch, following, search…)

I am sure that we could find a way to beat this with a stick until it
behaves, but I don't really like that idea.  It seems to me to be a
case of adding useless parser bloat.  Prior to 9.0, when we introduced
the new EXPLAIN syntax, new EXPLAIN options were repeatedly proposed
and partly rejected on the grounds that they weren't important enough
to justify adding new keywords.  We've now got EXPLAIN (COSTS,
BUFFERS, TIMING, FORMAT) and there will probably be more in the
future, and the parser overhead of adding a new one is zero.  I think
we should learn from that lesson: when you may want to have a bunch of
different options and it's not too clear what they're all going to be
named, designing things in a way that avoids a dependency on the main
parser having the right keywords leads to less patch rejection and
more getting stuff done.

 I've also had another general thought about safety: we need to make
 sure that we're only firing triggers from places where it's safe for
 user code to perform arbitrary actions.  In particular, we have to
 assume that any triggers we invoke will AcceptInvalidationMessages()
 and CommandCounterIncrement(); and we probably can't do it at all (or
 at least not without some additional safeguard) in places where the
 code does CheckTableNotInUse() up through the point where it's once
 again safe to access the table, since the trigger might try.  We also

 I've been trying to do that already.

 need to think about re-entrancy: that is, in theory, the trigger might
 execute any other DDL command - e.g. it might drop the table that
 we've been asked to rename.  I suspect that some of the current BEFORE

 That's why I implemented ALTER COMMAND TRIGGER ... SET DISABLE in the
 first place, so that you could run the same command from the trigger
 itself without infinite recursion.

 trigger stuff might fall down on that last one, since the existing
 code not-unreasonably expects that once it's locked the table, the
 catalog entries can't go away.  Maybe it all happens to work out OK,
 but I don't think we can count on that.

 I didn't think of DROP TABLE in a command table running BEFORE ALTER
 TABLE, say. That looks evil. It could be documented as yet another way
 to shoot yourself in the foot though?

Well, it depends somewhat on how it fails.  If it fails by crashing
the server, for example, I don't think that's going to fly.  My
suspicion is that it won't do that, but what it might do is fail in
some pretty odd and unpredictable ways, possibly leading to catalog
corruption, which I don't feel too good about either.  Think about not
just ALTER vs. DROP but also ALTER vs. ALTER.  It's probably easier to
add guards against this kind of thing than it is to prove that it's
not going to do anything too wacky; the obvious idea that comes to
mind is to bump the command counter after returning from the last
trigger (if that doesn't happen already) and then verify that the
tuple is still present and has whatever other properties we've already
checked and are counting on, and if not chuck an error.

I think that for a first version of this feature it probably would be
smart to trim this back to something that doesn't involve surgery on
the guts of every command; then, we can extend incrementally.  Nothing
you've proposed seems impossible to me, but most of the really
interesting things are hard, and it would be much easier to handle
patches intended to cater to more complex use cases if the basic
infrastructure were already committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Standbys, txid_current_snapshot, wraparound

On Thu, Mar 29, 2012 at 11:12 AM, Marko Kreen mark...@gmail.com wrote:
 On Thu, Mar 29, 2012 at 10:37:54AM +0100, Simon Riggs wrote:
 When the standby receives the checkpoint record, it stores the
 information in 2 places:
 i) directly into ControlFile-checkPointCopy
 ii) and then into XLogCtl when a safe restartpoint occurs

 In RecoveryRestartPoint() I see:

 - memcpy(XLogCtl-lastCheckPoint, checkPoint, sizeof(CheckPoint));

 but I still don't see how are the ckptXid* values updated?

 What am I missing?

It's updated on the master and then copied into place.

Patch coming in a few hours.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-29 Thread Peter Geoghegan

On 29 March 2012 02:09, Tom Lane t...@sss.pgh.pa.us wrote:
 Thanks.  I've committed the patch along with the docs, after rather
 heavy editorialization.

Thank you.

 1. What to do with EXPLAIN, SELECT INTO, etc.  We had talked about
 tweaking the behavior of statement nesting and some other possibilities.
 I think clearly this could use improvement but I'm not sure just how.
 (Note: I left out the part of your docs patch that attempted to explain
 the current behavior, since I think we should fix it not document it.)

Yeah, this is kind of unsatisfactory. Nobody would expect the module
to behave this way. On the other hand, users probably aren't hugely
interested in this information.

I'm still kind of attached to the idea of exposing the hash value in
the view. It could be handy in replication situations, to be able to
aggregate statistics across the cluster (assuming you're using
streaming replication and not a trigger based system). You need a
relatively stable identifier to be able to do that. You've already
sort-of promised to not break the format in point releases, because it
is serialised to disk, and may have to persist for months or years.
Also, it will drive home the reality of what's going on in situations
like this (from the docs):

In some cases, queries with visibly different texts might get merged
into a single pg_stat_statements entry. Normally this will happen only
for semantically equivalent queries, but there is a small chance of
hash collisions causing unrelated queries to be merged into one entry.
(This cannot happen for queries belonging to different users or
databases, however.)

Since the hash value is computed on the post-parse-analysis
representation of the queries, the opposite is also possible: queries
with identical texts might appear as separate entries, if they have
different meanings as a result of factors such as different
search_path settings.

 2. Whether and how to adjust the aging-out of sticky entries.  This
 seems like a research project, but the code impact should be quite
 localized.

As I said, I'll try and give it some thought, and do some experiments.

 BTW, I eventually concluded that the parameterization testing we were
 worried about before was a red herring.  As committed, the patch tries
 to store a normalized string if it found any deletable constants, full
 stop.  This seems to me to be correct behavior because the presence of
 constants is exactly what makes the string normalizable, and such
 constants *will* be ignored in the hash calculation no matter whether
 there are other parameters or not.

Yeah, that aspect of not canonicalising parametrised entries had
bothered me. I guess we're better off gracefully handling the problem
across the board, rather than attempting to band-aid the problem up by
just not having speculative hashtable entries in cases where they
arguably are not so useful. Avoiding canonicalising those constants
was somewhat misleading.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

On Thu, Mar 29, 2012 at 4:37 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Hitoshi Harada umi.tan...@gmail.com writes:
 So my question is why you cannot depend on ip4r in that case.  If some
 version of the module introduces ipv6, then let's depend on that
 version.  It doesn't explain why a string feature name is needed.

 The only operator we have to compare version strings in PostgreSQL
 extensions is string equality. That's because we don't need more, and
 designing a version scheme policy would be far more work that any
 benefit I could imagine. And as we didn't enforce version naming policy
 in 9.1, it could be argued that it's too late, too.

 When you say 'require = ip6r' you are depending on *any* version of the
 extension that is providing it, whatever its version string. You don't
 have to know that '1.05'  '1.06'  '1.1' and you don't have to know
 that the first version with ipv6 support was called '1.06'.

If I recall previous discussion correctly, there's fairly broad
consensus among people Tom talks to that dependencies on specific
versions are inferior to dependencies on features provided by those
versions.  That having been said, it might be hard for package authors
to know what features other packages want to depend on; and perhaps
any version of a package might add a new feature that someone might
want.  So it might be that, after we add this feature, nearly all
packagers will do one of two things:

1. Not add a provides line at all, thus making it impossible for
anyone else to depend on specific versions; or

2. Add a new feature to the provides line with every release that does
anything other than fix bugs, leading to:

provides = foobar-1.1, foobar-1.2, foobar-1.3, foobar-1.4, foobar-1.5,
foobar-1.6, foobar-2.0, foobar-2.1, foobar-2.2, foobar-2.3,
foobar-3.0, foobar-3.1

If that turns out to be the case, we may regret designing the feature
this way.  Again, the fact that packaging is part of shipping a
PostgreSQL extension rather than something that gets tacked on after
the fact is not an irrelevant detail.  Red Hat can rearrange all the
tags they ship at will every time they put out a new version of their
distribution, they can enforce uniform standards for how those tags
are named, and, maybe most important of all, they don't need to add a
tag for every version of the feature that someone *might* want to
depend on - only the things that someone *does* want to depend on.

Now in spite of all that I'm not sure this is a bad way to go: maybe
trying to solve the problem and ending up with something imperfect is
better than throwing our hands up in the air and doing nothing.  But
on the other hand I don't think Harada-san's comments are entirely
ill-founded either: how sure are we that this is the right way to go,
and what feedback do we have from people who are using this feature
that leads us to think this is adequate?

Does anyone else have an opinion on this?  I think that technically
this patch can be polished well enough to commit in the time we have
available, but the question of whether it's the right design is
harder, and I don't want that to be my call alone.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

2012-03-29 Thread Kevin Grittner

Robert Haas  wrote:
 
 I think that technically this patch can be polished well enough to
 commit in the time we have available, but the question of whether
 it's the right design is harder, and I don't want that to be my
 call alone.
 
I gather from previous posts that the intent isn't to allow different
packages from different authors to provide a common and compatible
feature; but what happens in the current design if someone
accidentally or maliciously produces an extension which provides the
same feature name as another extension?
 
Would we need some registry?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG FETCH readahead

2012-03-29 Thread Boszormenyi Zoltan


2012-03-29 12:59 keltezéssel, Boszormenyi Zoltan írta:

2012-03-29 02:43 keltezéssel, Noah Misch írta:

On Sat, Mar 24, 2012 at 10:49:07AM +0100, Boszormenyi Zoltan wrote:

+the window size may be modified by setting 
theliteralECPGFETCHSZ/literal
+environment variable to a different value.

I had in mind that DECLARE statements adorned with READAHEAD syntax would
always behave precisely as written, independent of ECPGFETCHSZ or ecpg -R.
Unadorned DECLARE statements would use ECPGFETCHSZ if set, then the value
passed to ecpg -R if provided, and finally a value of 1 (no readahead) in
the absence of both ECPGFETCHSZ and ecpg -R.  Did you do it differently for
a particular reason?  I don't particularly object to what you've implemented,
but I'd be curious to hear your thinking.


What I had in mind was:

NO READAHEAD == READAHEAD 0. This will translate into 1 tuple sized
readahead window that cannot be modified by ECPGFETCHSZ.


After the core patch, this is not totally true yet. The NO READAHEAD
cursors don't go through the new cursor functions at all. But cursor.c
is prepared for the second patch that makes all cursors use the caching code.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers patch v18

Robert Haas robertmh...@gmail.com writes:
 I am sure that we could find a way to beat this with a stick until it
 behaves, but I don't really like that idea.  It seems to me to be a
[...]
 we should learn from that lesson: when you may want to have a bunch of

I first wanted to ensure that reusing existing parser keyword wouldn't
be well enough for our case as I find that solution more elegant. If we
can't think of future places where we would want to add support for
calling command triggers, then yes, you're right, something more
flexible is needed.

I'll go make that happen, and still need input here. We first want to
have command triggers on specific commands or ANY command, and we want
to implement 3 places from where to fire them.

Here's a new syntax proposal to cope with that:

 create command trigger before COMMAND_STEP of alter table
  execute procedure snitch();

 - before the process utility switch, with only command tag and parse
   tree

 create command trigger foo before start of alter table
  execute procedure snitch();

 - before running the command, after having done basic error checks,
   security checks, name lookups and locking, with all information

 create command trigger before execute of alter table
  execute procedure snitch();

 - after having run the command

 create command trigger foo before end of alter table
  execute procedure snitch();

Some other places make sense to add support for, but working within the
delays we have and with the current patch, those 3 points are what's
easy enough to implement now.

 I didn't think of DROP TABLE in a command table running BEFORE ALTER
 TABLE, say. That looks evil. It could be documented as yet another way
 to shoot yourself in the foot though?

 I think that for a first version of this feature it probably would be
 smart to trim this back to something that doesn't involve surgery on
 the guts of every command; then, we can extend incrementally.

A simple enough solution here would be to register a list disabled
commands per objectid before running the user defined function, and
check against that list before allowing it to run.

As we have command trigger support for all the commands we want to be
able to block, the lookup and ereport() call can be implemented in
InitCommandContext() where we already apply some limitations (like no
command triggers on command trigger commands).

Once that is in place it's possible to refine the pre-condition checks
on a per command basis and stop adding the command from the black list.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

2012-03-29 Thread Benedikt Grundmann

On Thu, Mar 29, 2012 at 1:01 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 I gather from previous posts that the intent isn't to allow different
 packages from different authors to provide a common and compatible
 feature; but what happens in the current design if someone
 accidentally or maliciously produces an extension which provides the
 same feature name as another extension?

 Would we need some registry?

A good (documented) convention should make that unnecessary such
as:

packagename.feature

for example
provides hstore.populate_record

Or something along those lines.  Or maybe even prefix it with java
like inverse url of author of package.

Cheers,

Bene

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

Kevin Grittner kevin.gritt...@wicourts.gov writes:
 I gather from previous posts that the intent isn't to allow different
 packages from different authors to provide a common and compatible
 feature; but what happens in the current design if someone
 accidentally or maliciously produces an extension which provides the
 same feature name as another extension?

It's not about that, it's more like the features/require/provide
concepts in Lisp, except that we're not using them to load files.

The goal really is to avoid a feature matrix and a version policy with
comparators, yet be able to depend on features that got implemented
after the first release of an extension.

 Would we need some registry?

That being said, we still have a single namespace for extensions and
their features, so a registry would help, yes.

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

On Thu, Mar 29, 2012 at 8:01 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas  wrote:

 I think that technically this patch can be polished well enough to
 commit in the time we have available, but the question of whether
 it's the right design is harder, and I don't want that to be my
 call alone.

 I gather from previous posts that the intent isn't to allow different
 packages from different authors to provide a common and compatible
 feature; but what happens in the current design if someone
 accidentally or maliciously produces an extension which provides the
 same feature name as another extension?

 Would we need some registry?

One thing I was thinking about was whether we should restrict feature
names to be of some specific form, like extension_name:feature_name.
That would address this issue, and would also keep people from
thinking of this as an alternatives mechanism, as I did.

Of course, that doesn't prevent someone from publishing an ip4r module
that erases your hard disk, but there's nothing much we can do about
that problem from within core PostgreSQL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers patch v18

2012-03-29 Thread Thom Brown

On 29 March 2012 13:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 I'll go make that happen, and still need input here. We first want to
 have command triggers on specific commands or ANY command, and we want
 to implement 3 places from where to fire them.

 Here's a new syntax proposal to cope with that:

     create command trigger before COMMAND_STEP of alter table
          execute procedure snitch();

  - before the process utility switch, with only command tag and parse
   tree

     create command trigger foo before start of alter table
          execute procedure snitch();

  - before running the command, after having done basic error checks,
   security checks, name lookups and locking, with all information

     create command trigger before execute of alter table
          execute procedure snitch();

  - after having run the command

     create command trigger foo before end of alter table
          execute procedure snitch();

Is it necessary to add this complexity in this version?  Can't we keep
it simple but in a way that allows the addition of this later?  The
testing of all these new combinations sounds like a lot of work.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standbys, txid_current_snapshot, wraparound

On Thu, Mar 29, 2012 at 12:06 PM, Simon Riggs si...@2ndquadrant.com wrote:

 Patch coming in a few hours.

This is more straightforward than I was thinking. We just need to
initialise XLogCtl at the right place.


-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ff7f521..d268014 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6320,6 +6320,10 @@ StartupXLOG(void)
 		/* No need to hold ControlFileLock yet, we aren't up far enough */
 		UpdateControlFile();
 
+		/* initialize shared-memory copy of latest checkpoint XID/epoch */
+		XLogCtl-ckptXidEpoch = ControlFile-checkPointCopy.nextXidEpoch;
+		XLogCtl-ckptXid = ControlFile-checkPointCopy.nextXid;
+
 		/* initialize our local copy of minRecoveryPoint */
 		minRecoveryPoint = ControlFile-minRecoveryPoint;
 
@@ -6915,10 +6919,6 @@ StartupXLOG(void)
 	/* start the archive_timeout timer running */
 	XLogCtl-Write.lastSegSwitchTime = (pg_time_t) time(NULL);
 
-	/* initialize shared-memory copy of latest checkpoint XID/epoch */
-	XLogCtl-ckptXidEpoch = ControlFile-checkPointCopy.nextXidEpoch;
-	XLogCtl-ckptXid = ControlFile-checkPointCopy.nextXid;
-
 	/* also initialize latestCompletedXid, to nextXid - 1 */
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 	ShmemVariableCache-latestCompletedXid = ShmemVariableCache-nextXid;
@@ -8601,6 +8601,17 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
 		ControlFile-checkPointCopy.nextXidEpoch = checkPoint.nextXidEpoch;
 		ControlFile-checkPointCopy.nextXid = checkPoint.nextXid;
 
+		/* Update shared-memory copy of checkpoint XID/epoch */
+		{
+			/* use volatile pointer to prevent code rearrangement */
+			volatile XLogCtlData *xlogctl = XLogCtl;
+
+			SpinLockAcquire(xlogctl-info_lck);
+			xlogctl-ckptXidEpoch = checkPoint.nextXidEpoch;
+			xlogctl-ckptXid = checkPoint.nextXid;
+			SpinLockRelease(xlogctl-info_lck);
+		}
+
 		/*
 		 * TLI may change in a shutdown checkpoint, but it shouldn't decrease
 		 */
@@ -8645,6 +8656,17 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
 		ControlFile-checkPointCopy.nextXidEpoch = checkPoint.nextXidEpoch;
 		ControlFile-checkPointCopy.nextXid = checkPoint.nextXid;
 
+		/* Update shared-memory copy of checkpoint XID/epoch */
+		{
+			/* use volatile pointer to prevent code rearrangement */
+			volatile XLogCtlData *xlogctl = XLogCtl;
+
+			SpinLockAcquire(xlogctl-info_lck);
+			xlogctl-ckptXidEpoch = checkPoint.nextXidEpoch;
+			xlogctl-ckptXid = checkPoint.nextXid;
+			SpinLockRelease(xlogctl-info_lck);
+		}
+
 		/* TLI should not change in an on-line checkpoint */
 		if (checkPoint.ThisTimeLineID != ThisTimeLineID)
 			ereport(PANIC,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers patch v18

On Thu, Mar 29, 2012 at 9:01 AM, Thom Brown t...@linux.com wrote:
 On 29 March 2012 13:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 I'll go make that happen, and still need input here. We first want to
 have command triggers on specific commands or ANY command, and we want
 to implement 3 places from where to fire them.

 Here's a new syntax proposal to cope with that:

     create command trigger before COMMAND_STEP of alter table
          execute procedure snitch();

  - before the process utility switch, with only command tag and parse
   tree

     create command trigger foo before start of alter table
          execute procedure snitch();

  - before running the command, after having done basic error checks,
   security checks, name lookups and locking, with all information

     create command trigger before execute of alter table
          execute procedure snitch();

  - after having run the command

     create command trigger foo before end of alter table
          execute procedure snitch();

 Is it necessary to add this complexity in this version?  Can't we keep
 it simple but in a way that allows the addition of this later?  The
 testing of all these new combinations sounds like a lot of work.

I concur.  This is way more complicated than we should be trying to do
in version 1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standbys, txid_current_snapshot, wraparound

On Thu, Mar 29, 2012 at 02:46:23PM +0100, Simon Riggs wrote:
 On Thu, Mar 29, 2012 at 12:06 PM, Simon Riggs si...@2ndquadrant.com wrote:
  Patch coming in a few hours.
 
 This is more straightforward than I was thinking. We just need to
 initialise XLogCtl at the right place.

Looks good to me.  That should fix the problem with standbys then.


Next question: how can flipping archive_mode on and off,
with restarts, near wraparound point, break epoch on master?

  http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001609.html

Any ideas?

Could WAL playback from backend with different archive_mode setting
cause it somehow?

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standbys, txid_current_snapshot, wraparound

On Thu, Mar 29, 2012 at 3:04 PM, Marko Kreen mark...@gmail.com wrote:
 On Thu, Mar 29, 2012 at 02:46:23PM +0100, Simon Riggs wrote:
 On Thu, Mar 29, 2012 at 12:06 PM, Simon Riggs si...@2ndquadrant.com wrote:
  Patch coming in a few hours.

 This is more straightforward than I was thinking. We just need to
 initialise XLogCtl at the right place.

 Looks good to me.  That should fix the problem with standbys then.


 Next question: how can flipping archive_mode on and off,
 with restarts, near wraparound point, break epoch on master?

  http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001609.html

 Any ideas?

 Could WAL playback from backend with different archive_mode setting
 cause it somehow?

The current code would allow it, if you managed to go for 4 billion
xids between checkpoints. If that happened, it would give an off by
one error each time it happened.

I don't believe that is possible.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standbys, txid_current_snapshot, wraparound

On Thu, Mar 29, 2012 at 03:23:01PM +0100, Simon Riggs wrote:
 On Thu, Mar 29, 2012 at 3:04 PM, Marko Kreen mark...@gmail.com wrote:
  Next question: how can flipping archive_mode on and off,
  with restarts, near wraparound point, break epoch on master?
 
   http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001609.html
 
  Any ideas?
 
  Could WAL playback from backend with different archive_mode setting
  cause it somehow?
 
 The current code would allow it, if you managed to go for 4 billion
 xids between checkpoints. If that happened, it would give an off by
 one error each time it happened.
 
 I don't believe that is possible.

Yes, that does not sound like probable case.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] HTTP Frontend? (and a brief thought on materialized views)

Hi guys,

Something from Josh's recent blog post about summer of code clicked with me
- the HTTP / SQL concept.

It was something I'd been thinking about earlier, how people really like
HTTP APIs and this is one of the drivers behind adoption of  some NoSQL
databases out there.

Some things that I think are great about NoSQL that are missing in
PostgreSQL are:

   1. The reduce function which could be modeled as a kind of
   materialized view with an index on it.  With materialized views and the
   ability to pull fields from JSON and XML data structures you could easily
   do a NoSQL database inside of PostgreSQL by saving the document as a big
   string and then using materialized views to have all kinds of derived data
   from those documents.  Sounds cool on paper, anyway.
   2. HTTP RESTful API.  This is obviously not as useful as a good admin
   tool like pgAdmin or a fast binary protocol like we have now but it's way
   more buzzword compliant and would come in handy sometimes.  With CouchDB I
   was able to allow public data in the database to be accessed directly
   from the browser using JSONP and/or a simple HTTP proxy in the server
   instead of doing any of that work within the app server.  So, that saves a
   step somewhere.  With some basic support for stored procedures and serving
   files directly via this HTTP thing you could potentially eliminate the app
   server for public, read-only parts of a site/application.
   3. The perception of extremely low latency and/or high throughput - like
   fetching a row in less than 1ms or whatever.  I don't know the exact
   numbers I just know they have to be insanely low (latency) or impressively
   high (throughput).  We could use PostgreSQL as a query cache to accelerate
   your MySQL!  Something like that :-P.
   4. Rebelliousness against the man.  In our case SQL can't be the
   man, but we can find something PostgreSQL isn't and position against that.

Anyway, 12 seem inspiring to me and I'd love to help out with both.  34
seem more like marketing tasks of some sort...

I think I could probably start hacking up an HTTP API of some sort, I
wasn't able to log into the PostgreSQL Wiki so I created a page with some
notes here:

http://dobesv.com/docs/postgresql-http-api.html

For materialized views I think that can be a bit of a can of worms,
especially if you want to do incremental updates of some sort because you
have to figure out what rows need to be recalculated when you update a row
in a source table, and how best to update them.  But if someone thinks
they know how I can assist in implementation.

Anyway, I hope I can help AND that I posted this in the correct mailing
list!

Cheers,

Dobes

Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)

2012-03-29 Thread Andrew Dunstan




On 03/29/2012 10:37 AM, Dobes Vandermeer wrote:

Hi guys,

Something from Josh's recent blog post about summer of code clicked 
with me - the HTTP / SQL concept.


It was something I'd been thinking about earlier, how people really 
like HTTP APIs and this is one of the drivers behind adoption of  some 
NoSQL databases out there.


Some things that I think are great about NoSQL that are missing in 
PostgreSQL are:


 1. The reduce function which could be modeled as a kind of
materialized view with an index on it.  With materialized views
and the ability to pull fields from JSON and XML data structures
you could easily do a NoSQL database inside of PostgreSQL by
saving the document as a big string and then using materialized
views to have all kinds of derived data from those documents.
 Sounds cool on paper, anyway.
 2. HTTP RESTful API.  This is obviously not as useful as a good admin
tool like pgAdmin or a fast binary protocol like we have now but
it's way more buzzword compliant and would come in handy
sometimes.  With CouchDB I was able to allow public data in the
database to be accessed directly from the browser using JSONP
and/or a simple HTTP proxy in the server instead of doing any of
that work within the app server.  So, that saves a step somewhere.
 With some basic support for stored procedures and serving files
directly via this HTTP thing you could potentially eliminate the
app server for public, read-only parts of a site/application.
 3. The perception of extremely low latency and/or high throughput -
like fetching a row in less than 1ms or whatever.  I don't know
the exact numbers I just know they have to be insanely low
(latency) or impressively high (throughput).  We could use
PostgreSQL as a query cache to accelerate your MySQL!  Something
like that :-P.
 4. Rebelliousness against the man.  In our case SQL can't be the
man, but we can find something PostgreSQL isn't and position
against that.

Anyway, 12 seem inspiring to me and I'd love to help out with both. 
 34 seem more like marketing tasks of some sort...


I think I could probably start hacking up an HTTP API of some sort, I 
wasn't able to log into the PostgreSQL Wiki so I created a page with 
some notes here:


http://dobesv.com/docs/postgresql-http-api.html

For materialized views I think that can be a bit of a can of worms, 
especially if you want to do incremental updates of some sort because 
you have to figure out what rows need to be recalculated when you 
update a row in a source table, and how best to update them.  But if 
someone thinks they know how I can assist in implementation.


Anyway, I hope I can help AND that I posted this in the correct 
mailing list!





1. I've been in discussion with some people about adding simple JSON 
extract functions. We already have some (i.e. xpath()) for XML.


2. You might find htsql http://htsql.org/ interesting.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

Robert Haas robertmh...@gmail.com writes:
 On Wed, Mar 28, 2012 at 10:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The SELECT INTO tests all fail, but we know the reason why (the testbed
 isn't expecting them to result in creating separate entries for the
 utility statement and the underlying plannable SELECT).

 This might be a dumb idea, but for a quick hack, could we just rig
 SELECT INTO, CREATE TABLE AS, and EXPLAIN not to create entries for
 themselves at all, without suppressing creation of an entry for the
 underlying query?

It would make more sense to me to go the other way, that is suppress
creation of a separate entry for the contained optimizable statement.
The stats will still be correctly accumulated into the surrounding
statement (or at least, if they are not, that's a separate pre-existing
bug).  If we do it in the direction you suggest, we'll fail to capture
costs incurred outside execution of the contained statement.

Right now, we already have logic in there to track nesting of statements
in a primitive way, that is just count the nesting depth.  My first idea
about fixing this was to tweak that logic so that it stacks a flag
saying we're in a utility statement that contains an optimizable
statement, and then the first layer of Executor hooks that sees that
flag set would know to not do anything.  However this isn't quite good
enough because that first layer might not be for the same statement.
As an example, in an EXPLAIN ANALYZE the planner might pre-execute
immutable or stable SQL functions before we reach the executor.  We
would prefer that any statements embedded in such a function still be
seen as independent nested statements.

However, I think there is a solution for that, though it may sound a bit
ugly.  Rather than just stacking a flag, let's stack the query source
text pointer for the utility statement.  Then in the executor hooks,
if that pointer is *pointer* equal (not strcmp equal) to the optimizable
statement's source-text pointer, we know we are executing the same
statement as the surrounding utility command, and we do nothing.

This looks like it would work for the SELECT INTO and EXPLAIN cases,
and for DECLARE CURSOR whenever that gets changed to a less bizarre
structure.  It would not work for EXECUTE, because in that case we
pass the query string saved from PREPARE to the executor.  However,
we could possibly do a special-case hack for EXECUTE; maybe ask
prepare.c for the statement's string and stack that instead of the
outer EXECUTE query string.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers patch v18

On Thu, Mar 29, 2012 at 8:30 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 I'll go make that happen, and still need input here. We first want to
 have command triggers on specific commands or ANY command, and we want
 to implement 3 places from where to fire them.

 Here's a new syntax proposal to cope with that:

     create command trigger before COMMAND_STEP of alter table
          execute procedure snitch();

  - before the process utility switch, with only command tag and parse
   tree

     create command trigger foo before start of alter table
          execute procedure snitch();

  - before running the command, after having done basic error checks,
   security checks, name lookups and locking, with all information

     create command trigger before execute of alter table
          execute procedure snitch();

  - after having run the command

     create command trigger foo before end of alter table
          execute procedure snitch();

One thought is that it might be better to say AT or ON or WHEN rather
than BEFORE, since BEFORE END is just a little strange; and also
because a future hook point might be something like
permissions-checking, and we really want it to be AT
permissions-checking time, not BEFORE permissions-checking time.

If I got to pick, I'd pick this syntax:

CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...])
EXECUTE PROCEDURE function_name(args);

Then you could eventually imagine:

CREATE EVENT TRIGGER prohibit_all_ddl ON ddl_command_start EXECUTE
PROCEDURE throw_an_error();
CREATE EVENT TRIGGER prohibit_some_ddl ON ddl_command_start
(alter_table) EXECUTE PROCEDURE throw_an_error();
CREATE EVENT TRIGGER replicate_table_drops ON sql_drop (table) EXECUTE
PROCEDURE record_table_drop();
CREATE EVENT TRIGGER allow_whitelisted_extensions ON permissions_check
(create_extension) EXECUTE PROCEDURE make_heroku_happy();
CREATE EVENT TRIGGER replicate_at_subcommands ON subcommand_start
(alter_table) EXECUTE PROCEDURE record_alter_table_subcommand();

Piece by piece:

- I prefer EVENT to COMMAND, because the start or end or any other
point during the execution of a command can be viewed as an event;
however, it is pretty clear that not everything that can be viewed as
an event upon which we might like triggers to fire can be viewed as a
command, even if we have a somewhat well-defined notion of
subcommands.  I continue to think that there is no sense in which
creating a sequence to back a serial column or dropping an object due
to DROP .. CASCADE is a subcommand; it is, however, pretty clearly an
event.  Anything we want can be an event.  I think if we don't get
this right in the first version we're going to be stuck with a
misnomer forever.  :-(

- It is pretty clear that for triggers that are supposed to fire at
the start or end of a command, it's useful to be able to specify which
commands it fires for.  However, there must be other types of events
(as in my postulated sql_drop case above) where the name of the
command is not relevant - people aren't going to find all the objects
that got dropped as a result of a toplevel drop table command; they're
going to want to find all the tables that got dropped regardless of
which incantation did it.  Also keep in mind that you can do things
like use ALTER TABLE to rename a view (and there are other cases of
that sort of confusion with domains vs. types, aggregates vs.
functions, etc), so being able to filter based on object type seems
clearly better in a bunch of real-world cases.  And, even if you don't
believe that specific example, a general syntax leaves us with freedom
to maneuver if we discover other needs in the future.  Writing
alter_table etc. as tokens rather than as ALTER TABLE also has the
further advantages of (1) greatly decreasing the parser footprint of
this feature and (2) relieving everyone who adds commands in the
future of the need to also change the command-trigger grammar.

- I don't think that triggers at what Dimitri is calling before
execute are a good idea at all for the first version of this feature,
because there is a lot of stuff that might break and examining that
should really be a separate project from getting the basic
infrastructure in place.  However, I also don't like the name.  Unlike
queries, DDL commands don't have clearly separate planning and
execution phases, so saying that something happens before execution
isn't really very clear.  As previously discussed, the hook points in
the latest patch are not all entirely consistent between one command
and the next, not only in the larger ways I've already pointed out but
also in some smaller ways.  Different commands do different kinds of
integrity checks and the firing points are not always in exactly the
same place in that sequence.  Of course, not all the commands do all
the checks in the same order, so some possibly-not-trivial refactoring
is probably required to get that consistent.  Moreover, it doesn't
seem out of reach to

Re: [HACKERS] Command Triggers patch v18

Robert Haas robertmh...@gmail.com writes:
 On Thu, Mar 29, 2012 at 9:01 AM, Thom Brown t...@linux.com wrote:
 On 29 March 2012 13:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 I'll go make that happen, and still need input here. We first want to
 have command triggers on specific commands or ANY command, and we want
 to implement 3 places from where to fire them.

 Here's a new syntax proposal to cope with that:

 Is it necessary to add this complexity in this version?  Can't we keep
 it simple but in a way that allows the addition of this later?  The
 testing of all these new combinations sounds like a lot of work.

 I concur.  This is way more complicated than we should be trying to do
 in version 1.

I'm at a loss here. This proposal was so that we can reach a commonly
agreed minimal solution and design in first version. There's no new
piece of infrastructure to add, the syntax is changed only to open the
road for later, I'm not changing where the current command triggers are
to be called (except for those which are misplaced).

So, please help me here: what do we want to have in 9.3?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

On Thu, Mar 29, 2012 at 11:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 It would make more sense to me to go the other way, that is suppress
 creation of a separate entry for the contained optimizable statement.
 The stats will still be correctly accumulated into the surrounding
 statement (or at least, if they are not, that's a separate pre-existing
 bug).  If we do it in the direction you suggest, we'll fail to capture
 costs incurred outside execution of the contained statement.

All things being equal, I completely agree.  However, ISTM that the
difficulty of implementation might be higher for your proposal, for
the reasons you go on to state.  If getting it right means that other
significant features are not going to get committed at all for 9.2, I
think we could leave this as a TODO.

 Right now, we already have logic in there to track nesting of statements
 in a primitive way, that is just count the nesting depth.  My first idea
 about fixing this was to tweak that logic so that it stacks a flag
 saying we're in a utility statement that contains an optimizable
 statement, and then the first layer of Executor hooks that sees that
 flag set would know to not do anything.  However this isn't quite good
 enough because that first layer might not be for the same statement.
 As an example, in an EXPLAIN ANALYZE the planner might pre-execute
 immutable or stable SQL functions before we reach the executor.  We
 would prefer that any statements embedded in such a function still be
 seen as independent nested statements.

 However, I think there is a solution for that, though it may sound a bit
 ugly.  Rather than just stacking a flag, let's stack the query source
 text pointer for the utility statement.  Then in the executor hooks,
 if that pointer is *pointer* equal (not strcmp equal) to the optimizable
 statement's source-text pointer, we know we are executing the same
 statement as the surrounding utility command, and we do nothing.

Without wishing to tick you off, that sounds both ugly and fragile.
Can't we find a way to set the stacked flag (on the top stack frame)
after planning and before execution?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

Robert Haas robertmh...@gmail.com writes:
 On Thu, Mar 29, 2012 at 11:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, I think there is a solution for that, though it may sound a bit
 ugly.  Rather than just stacking a flag, let's stack the query source
 text pointer for the utility statement.  Then in the executor hooks,
 if that pointer is *pointer* equal (not strcmp equal) to the optimizable
 statement's source-text pointer, we know we are executing the same
 statement as the surrounding utility command, and we do nothing.

 Without wishing to tick you off, that sounds both ugly and fragile.

What do you object to --- the pointer-equality part?  We could do strcmp
comparison instead, on the assumption that a utility command could not
look the same as an optimizable statement except in the case we care
about.  I think that's probably unnecessary though.

 Can't we find a way to set the stacked flag (on the top stack frame)
 after planning and before execution?

That would require a way for pg_stat_statements to get control at rather
random places in several different types of utility statements.  And
if we did add hook functions in those places, you'd still need to have
sufficient stacked context for those hooks to know what to do, which
leads you right back to this I think.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers patch v18

2012-03-29 Thread Thom Brown

On 29 March 2012 16:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Mar 29, 2012 at 9:01 AM, Thom Brown t...@linux.com wrote:
 On 29 March 2012 13:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 I'll go make that happen, and still need input here. We first want to
 have command triggers on specific commands or ANY command, and we want
 to implement 3 places from where to fire them.

 Here's a new syntax proposal to cope with that:

 Is it necessary to add this complexity in this version?  Can't we keep
 it simple but in a way that allows the addition of this later?  The
 testing of all these new combinations sounds like a lot of work.

 I concur.  This is way more complicated than we should be trying to do
 in version 1.

 I'm at a loss here. This proposal was so that we can reach a commonly
 agreed minimal solution and design in first version. There's no new
 piece of infrastructure to add, the syntax is changed only to open the
 road for later, I'm not changing where the current command triggers are
 to be called (except for those which are misplaced).

 So, please help me here: what do we want to have in 9.3?

Perhaps I misunderstood.  It was the addition of the fine-grained even
options (parse, execute etc) I saw as new.  If you're saying those are
just possible options for later that won't be in this version, I'm
fine with that.  If those are to make it for 9.2, then creating the
necessary test cases and possible fixes sounds infeasible in such a
short space of time.  Please disregard if this is not the case.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] poll: CHECK TRIGGER?

2012-03-29 Thread Heikki Linnakangas


On 28.03.2012 23:54, Pavel Stehule wrote:

2012/3/28 Heikki Linnakangasheikki.linnakan...@enterprisedb.com:

In prepare_expr(), you use a subtransaction to catch any ERRORs that happen
during parsing the expression. That's a good idea, and I think many of the
check_* functions could be greatly simplified by adopting a similar
approach. Just ereport() any errors you find, and catch them at the
appropriate level, appending the error to the output string. Your current
approach of returning true/false depending on whether there was any errors
seems tedious.


This is not possible, when we would to enable fatal_errors = false
checking. I can do subtransaction in prepare_expr, because it is the
most deep level, but I cannot to use it elsewhere, because I cannot
handle exception and continue with other parts of statement.


Well, you can continue on the next statement. That's almost as good. In 
practice, if there's one error in a statement, it seems unlikely that 
you would correctly diagnose other errors on the same line. They're more 
likely to be fallout of the same mistake.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

On Thu, Mar 29, 2012 at 11:42 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Mar 29, 2012 at 11:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, I think there is a solution for that, though it may sound a bit
 ugly.  Rather than just stacking a flag, let's stack the query source
 text pointer for the utility statement.  Then in the executor hooks,
 if that pointer is *pointer* equal (not strcmp equal) to the optimizable
 statement's source-text pointer, we know we are executing the same
 statement as the surrounding utility command, and we do nothing.

 Without wishing to tick you off, that sounds both ugly and fragile.

 What do you object to --- the pointer-equality part?  We could do strcmp
 comparison instead, on the assumption that a utility command could not
 look the same as an optimizable statement except in the case we care
 about.  I think that's probably unnecessary though.

The pointer equality part seems like the worst ugliness, yes.

 Can't we find a way to set the stacked flag (on the top stack frame)
 after planning and before execution?

 That would require a way for pg_stat_statements to get control at rather
 random places in several different types of utility statements.  And
 if we did add hook functions in those places, you'd still need to have
 sufficient stacked context for those hooks to know what to do, which
 leads you right back to this I think.

What I'm imagining is that instead of just having a global for
nested_level, you'd have a global variable pointing to a linked list.
The length of the list would be equal to what we currently call
nested_level + 1.  Something like this:

struct pgss_nesting_info
{
struct pgss_nesting_info *next;
int flag;  /* bad name */
};
static pgss_nesting_info *pgss_stack_top;

So any test for nesting_depth == 0 would instead test
pgss_stack_top-next != NULL.

Then, when you get control at the relevant spots, you set
pgss_stack_top-flag = 1 and that's it.  Now, maybe it's too ugly to
think about passing control at those spots; I'm surprised there's not
a central point they all go through...

Another thought is: if we simply treated these as nested queries for
all purposes, would that really be so bad?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

Robert Haas robertmh...@gmail.com writes:
 What I'm imagining is that instead of just having a global for
 nested_level, you'd have a global variable pointing to a linked list.

This is more or less what I have in mind, too, except I do not believe
that a mere boolean flag is sufficient to tell the difference between
an executor call that you want to suppress logging for and one that
you do not.  You need some more positive way of identifying the target
statement than that, and what I propose that be is the statement's query
string.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers patch v18

Robert Haas robertmh...@gmail.com writes:
     create command trigger before COMMAND_STEP of alter table
          execute procedure snitch();

 One thought is that it might be better to say AT or ON or WHEN rather
 than BEFORE, since BEFORE END is just a little strange; and also
 because a future hook point might be something like
 permissions-checking, and we really want it to be AT
 permissions-checking time, not BEFORE permissions-checking time.

Yeah I tried different spellings and almost sent a version using AT or
WHEN, but it appeared to me not to be specific enough: AT permission
checking time does not exist, it either happens before or after
permission checking. I played with using “preceding” rather than before,
too, maybe you would like that better.

So I would document the different steps and their ordering, then use
only BEFORE as a qualifier, and add a pseudo step which is the end of
the command.

 If I got to pick, I'd pick this syntax:

 CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...])
 EXECUTE PROCEDURE function_name(args);

 Then you could eventually imagine:

 CREATE EVENT TRIGGER prohibit_all_ddl ON ddl_command_start EXECUTE
 PROCEDURE throw_an_error();
 CREATE EVENT TRIGGER prohibit_some_ddl ON ddl_command_start
 (alter_table) EXECUTE PROCEDURE throw_an_error();
 CREATE EVENT TRIGGER replicate_table_drops ON sql_drop (table) EXECUTE
 PROCEDURE record_table_drop();
 CREATE EVENT TRIGGER allow_whitelisted_extensions ON permissions_check
 (create_extension) EXECUTE PROCEDURE make_heroku_happy();
 CREATE EVENT TRIGGER replicate_at_subcommands ON subcommand_start
 (alter_table) EXECUTE PROCEDURE record_alter_table_subcommand();

I would amend that syntax to allow for a WHEN expr much like in the
DML trigger case, where the expression can play with the variables
exposed at the time of calling the trigger.

  create event trigger prohibit_some_ddl
 preceding timing spec
  when tag in ('CREATE TABLE', 'ALTER TABLE')
   execute procedure throw_an_error();

We must also take some time to describe the timing specs carefully,
their order of operation and which information are available for
triggers in each of them. See below.

I also don't like the way you squeeze in the drop cascade support here
by re qualifying it as a timing spec, which it's clearly not.

On subcommand_start, what is the tag set to? the main command or the
subcommand? if the former, where do you find the current subcommand tag?
I don't think subcommand_start should be a timing spec either.

 Piece by piece:

 - I prefer EVENT to COMMAND, because the start or end or any other
 point during the execution of a command can be viewed as an event;
 however, it is pretty clear that not everything that can be viewed as
 an event upon which we might like triggers to fire can be viewed as a
 command, even if we have a somewhat well-defined notion of
 subcommands.  I continue to think that there is no sense in which
 creating a sequence to back a serial column or dropping an object due
 to DROP .. CASCADE is a subcommand; it is, however, pretty clearly an
 event.  Anything we want can be an event.  I think if we don't get
 this right in the first version we're going to be stuck with a
 misnomer forever.  :-(

I can buy naming what we're building here EVENT TRIGGERs, albeit with
some caveats: what I implemented here only caters for commands for which
we have a distinct parse tree. That's what sub commands are in effect
and why it supports create schema sub commands but not all alter table
declination, and not cascaded drops.

Also, implementing cascade operations as calling process utility with a
complete parse tree is not going to be a little project on its own. We
can also punt that and document that triggers fired on cascaded drops
are not receiving a parse tree, only the command tag and object id and
object name and schema name, or even just the command tag and object id
in order not to incur too many additional name lookups.

Lastly, EVENT opens the road to thinking about transaction event
triggers, on begin|commit|rollback etc. Do we want to take that road?

 - It is pretty clear that for triggers that are supposed to fire at
 the start or end of a command, it's useful to be able to specify which
 commands it fires for.  However, there must be other types of events
 (as in my postulated sql_drop case above) where the name of the
 command is not relevant - people aren't going to find all the objects
 that got dropped as a result of a toplevel drop table command; they're
 going to want to find all the tables that got dropped regardless of
 which incantation did it.  Also keep in mind that you can do things
 like use ALTER TABLE to rename a view (and there are other cases of
 that sort of confusion with domains vs. types, aggregates vs.
 functions, etc), so being able to filter based on object type seems
 clearly better in a bunch of real-world cases.  And, even if you don't
 believe that specific

Re: [HACKERS] Command Triggers patch v18

On Thu, Mar 29, 2012 at 11:49 AM, Thom Brown t...@linux.com wrote:
 On 29 March 2012 16:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Mar 29, 2012 at 9:01 AM, Thom Brown t...@linux.com wrote:
 On 29 March 2012 13:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 I'll go make that happen, and still need input here. We first want to
 have command triggers on specific commands or ANY command, and we want
 to implement 3 places from where to fire them.

 Here's a new syntax proposal to cope with that:

 Is it necessary to add this complexity in this version?  Can't we keep
 it simple but in a way that allows the addition of this later?  The
 testing of all these new combinations sounds like a lot of work.

 I concur.  This is way more complicated than we should be trying to do
 in version 1.

 I'm at a loss here. This proposal was so that we can reach a commonly
 agreed minimal solution and design in first version. There's no new
 piece of infrastructure to add, the syntax is changed only to open the
 road for later, I'm not changing where the current command triggers are
 to be called (except for those which are misplaced).

 So, please help me here: what do we want to have in 9.3?

 Perhaps I misunderstood.  It was the addition of the fine-grained even
 options (parse, execute etc) I saw as new.  If you're saying those are
 just possible options for later that won't be in this version, I'm
 fine with that.  If those are to make it for 9.2, then creating the
 necessary test cases and possible fixes sounds infeasible in such a
 short space of time.  Please disregard if this is not the case.

So...

I've said repeatedly and over a long period of time that development
of this feature wasn't started early enough in the cycle to get it
finished in time for 9.2.  I think that I've identified some pretty
serious issues in the discussion we've had so far, especially (1) the
points at which figures are fired aren't consistent between commands,
(2) not much thought has been given to what happens if, say, a DDL
trigger performs a DDL operation on the table the outer DDL command is
due to modify, and (3) we are eventually going to want to trap a much
richer set of events than can be captured by the words before and
after.  Now, you could view this as me throwing up roadblocks to
validate my previously-expressed opinion that this wasn't going to get
done, but I really, honestly believe that these are important issues
and that getting them right is more important than getting something
done now.

If we still want to try to squeeze something into 9.2, I recommend
stripping out everything except for what Dimitri called the
before-any-command firing point.  In other words, add a way to run a
procedure after parsing of a command but before any name lookups have
been done, any permissions checks have been done, or any locks have
been taken.  The usefulness of such a hook is of course limited but it
is also a lot less invasive than the patch I recently reviewed and
probably a lot safer.  I actually think it's wise to do that as a
first step even if it doesn't make 9.2, because it is much easier to
build features like this incrementally and even a patch that does that
will be reasonably complicated and difficult to review.
Parenthetically, what Dimitri previously called the after-any-command
firing point, all the way at the end of the statement but without any
specific details about the object the statement operated on, seems
just as good for a first step, maybe better, so that would be a fine
foundation from my point of view as well.  The stuff that happens
somewhere in the middle, even just after locking and permissions
checking, is more complex and I think that should be phase 2
regardless of which phase ends up in which release cycle.

I'm not sure whether Dimitri's question about 9.3 was a typo for 9.2
or whether that's what he was actually asking about.  But to answer
that question for 9.3, I think we have time to build an extremely
extensive infrastructure that covers a huge variety of use cases, and
I think there is hardly any reasonable proposal that is off the table
as far as that goes.  We have a year to get that work done, and a year
is a long time, and with incremental progress at each CommitFest we
can do a huge amount.  I can also say that I would be willing to put
in some serious work during the 9.3 cycle to accelerate that progress,
too, especally if (ahem) some other people can return the favor by
reviewing my patches.  :-)  I could see us adding the functionality
described above in one CommitFest and then spending the next three
adding more whiz-bango frammishes and ending up with something really
nice.  Right now, though, we are very crunched for time, and probably
shouldn't be entertaining anything that requires a tenth the code
churn that this patch probably does; if we are going to do anything at
all, it had better be as simple and uninvasive as we

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

[ forgot to respond to this bit ]

Robert Haas robertmh...@gmail.com writes:
 Another thought is: if we simply treated these as nested queries for
 all purposes, would that really be so bad?

That was actually what I suggested first, and now that I look at the
code, that's exactly what's happening right now.  However, Peter pointed
out --- I think rightly --- that this fails to satisfy the principle
of least astonishment, because the user doesn't think of these
statements as being utility statements with other statements wrapped
inside.  Users are going to expect these cases to be treated as single
statements.

The issue existed before this patch, BTW, but was partially masked by
the fact that we grouped pg_stat_statements view entries strictly by
query text, so that both the utility statement and the contained
optimizable statement got matched to the same table entry.  The
execution costs were getting double-counted, but apparently nobody
noticed that.  As things now stand, the utility statement and contained
statement show up as distinct table entries (if you have
nested-statement tracking enabled) because of the different hashing
methods used.  And it's those multiple table entries that seem
confusing, especially since they are counting mostly the same costs.

[ time passes ... ]

Hm ... I just had a different idea.  I need to go look at the code
again, but I believe that in the problematic cases, the post-analyze
hook does not compute a queryId for the optimizable statement.  This
means that it will arrive at the executor with queryId zero.  What if
we simply made the executor hooks do nothing when queryId is zero?
(Note that this also means that in the problematic cases, the behavior
is already pretty wrong because executor costs for *all* statements of
this sort are getting merged into one hashtable entry for hash zero.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers patch v18

Robert Haas robertmh...@gmail.com writes:
 I've said repeatedly and over a long period of time that development
 of this feature wasn't started early enough in the cycle to get it
 finished in time for 9.2.  I think that I've identified some pretty

That could well be, yeah.

 serious issues in the discussion we've had so far, especially (1) the
 points at which figures are fired aren't consistent between commands,
 (2) not much thought has been given to what happens if, say, a DDL
 trigger performs a DDL operation on the table the outer DDL command is
 due to modify, and (3) we are eventually going to want to trap a much
 richer set of events than can be captured by the words before and
 after.  Now, you could view this as me throwing up roadblocks to
 validate my previously-expressed opinion that this wasn't going to get
 done, but I really, honestly believe that these are important issues
 and that getting them right is more important than getting something
 done now.

(1) is not hard to fix as soon as we set a policy, which the most
pressing thing we need to do in my mind, whatever we manage to
commit in 9.2.

(2) can be addressed with a simple blacklist that would be set just
before calling user defined code, and cleaned when done running it.
When in place the blacklist lookup is easy to implement in a central
place (utility.c or cmdtrigger.c) and ereport() when the current
command is in the blacklist.

e.g. alter table would blacklist alter table and drop table commands
on the current object id

(3) we need to continue designing that, yes. I think we can have a first
set of events defined now, even if we don't implement support for
all of them readily.

 If we still want to try to squeeze something into 9.2, I recommend
 stripping out everything except for what Dimitri called the
 before-any-command firing point.  In other words, add a way to run a

I would like that we can make that consistent rather than throw it, or
maybe salvage a part of the command we support here. It's easy enough if
boresome to document which commands are supported in which event timing.

 procedure after parsing of a command but before any name lookups have
 been done, any permissions checks have been done, or any locks have
 been taken.  The usefulness of such a hook is of course limited but it
 is also a lot less invasive than the patch I recently reviewed and
 probably a lot safer.  I actually think it's wise to do that as a
 first step even if it doesn't make 9.2, because it is much easier to
 build features like this incrementally and even a patch that does that
 will be reasonably complicated and difficult to review.

Yes.

 Parenthetically, what Dimitri previously called the after-any-command
 firing point, all the way at the end of the statement but without any
 specific details about the object the statement operated on, seems
 just as good for a first step, maybe better, so that would be a fine
 foundation from my point of view as well.  The stuff that happens

Now that fetching the information is implemented, I guess that we could
still provide for it when firing event trigger at that timing spec. Of
course that means a bigger patch to review when compared to not having
the feature, but Thom did spend loads of time to test this part of the
implementation.

 somewhere in the middle, even just after locking and permissions
 checking, is more complex and I think that should be phase 2
 regardless of which phase ends up in which release cycle.

Mmmm, ok.

 I'm not sure whether Dimitri's question about 9.3 was a typo for 9.2

Typo :)  I appreciate your offer, though :)

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] poll: CHECK TRIGGER?

2012-03-29 Thread Pavel Stehule

2012/3/29 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 On 28.03.2012 23:54, Pavel Stehule wrote:

 2012/3/28 Heikki Linnakangasheikki.linnakan...@enterprisedb.com:

 In prepare_expr(), you use a subtransaction to catch any ERRORs that
 happen
 during parsing the expression. That's a good idea, and I think many of
 the
 check_* functions could be greatly simplified by adopting a similar
 approach. Just ereport() any errors you find, and catch them at the
 appropriate level, appending the error to the output string. Your current
 approach of returning true/false depending on whether there was any
 errors
 seems tedious.


 This is not possible, when we would to enable fatal_errors = false
 checking. I can do subtransaction in prepare_expr, because it is the
 most deep level, but I cannot to use it elsewhere, because I cannot
 handle exception and continue with other parts of statement.


 Well, you can continue on the next statement. That's almost as good. In
 practice, if there's one error in a statement, it seems unlikely that you
 would correctly diagnose other errors on the same line. They're more likely
 to be fallout of the same mistake.

no, for example,  it means, if I found error in if_then statements,
still I would to check else statements.

Regards

Pavel




 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgxs and bison, flex

On ons, 2012-03-28 at 22:12 -0400, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  I propose that we apply the attached patch to make sure those variables
  are set to a usable default value in any case.
 
 Won't this break usages such as in contrib/cube?
 
 cubeparse.c: cubeparse.y
 ifdef BISON
   $(BISON) $(BISONFLAGS) -o $@ $
 else
   @$(missing) bison $ $@
 endif

No, the code in my patch is conditional on 'ifdef PGXS'.  (Not visible
in the patch, unfortunately.)

I don't think we want to support external use of $(missing), since the
text it refers to is specific to PostgreSQL's distribution mechanisms.

 IMO, people building distribution packages in clean chroots are best
 advised to include bison and flex even if they're not strictly
 necessary.  Otherwise the build could fall over unexpectedly and
 unnecessarily, depending on file timestamps and other phase-of-the-moon
 issues.  If the package maker has adopted that elementary bit of
 self-defense, the whole thing is a non problem.

I don't agree with that.  If this were a problem, dozens of packages
would be liable to break randomly, and this knowledge would have passed
into best practices and policies decades ago.

In any case, it won't really help because we can't enforce it, and we
can't tell the average person who installs from source to install
additional build dependencies that the installation instructions
explicitly say are not needed.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgxs and bison, flex

Peter Eisentraut pete...@gmx.net writes:
 On ons, 2012-03-28 at 22:12 -0400, Tom Lane wrote:
 Won't this break usages such as in contrib/cube?

 No, the code in my patch is conditional on 'ifdef PGXS'.  (Not visible
 in the patch, unfortunately.)

Oh, okay.

 I don't think we want to support external use of $(missing), since the
 text it refers to is specific to PostgreSQL's distribution mechanisms.

Fair enough.  Objection withdrawn.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers patch v18

On Thu, Mar 29, 2012 at 12:17 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
     create command trigger before COMMAND_STEP of alter table
          execute procedure snitch();

 One thought is that it might be better to say AT or ON or WHEN rather
 than BEFORE, since BEFORE END is just a little strange; and also
 because a future hook point might be something like
 permissions-checking, and we really want it to be AT
 permissions-checking time, not BEFORE permissions-checking time.

 Yeah I tried different spellings and almost sent a version using AT or
 WHEN, but it appeared to me not to be specific enough: AT permission
 checking time does not exist, it either happens before or after
 permission checking. I played with using “preceding” rather than before,
 too, maybe you would like that better.

Well, preceding and before are synonyms, so I don't see any advantage
in that change.  But I really did mean AT permissions_checking time,
not before or after it.  That is, we'd have a hook where instead of
doing something like this:

aclresult = pg_class_aclcheck(reloid, GetUserId(), ACL_SELECT);

...we'd call a superuser-supplied trigger function and let it make the
call.  This requires a clear separation of permissions and integrity
checking that we are currently lacking, and probably quite a bit of
other engineering too, but I think we should assume that we're going
to do it at some point because there is it seems pretty clear that
there is a huge amount of pent-up demand for being able to do things
like this.

 If I got to pick, I'd pick this syntax:

 CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...])
 EXECUTE PROCEDURE function_name(args);

 Then you could eventually imagine:

 CREATE EVENT TRIGGER prohibit_all_ddl ON ddl_command_start EXECUTE
 PROCEDURE throw_an_error();
 CREATE EVENT TRIGGER prohibit_some_ddl ON ddl_command_start
 (alter_table) EXECUTE PROCEDURE throw_an_error();
 CREATE EVENT TRIGGER replicate_table_drops ON sql_drop (table) EXECUTE
 PROCEDURE record_table_drop();
 CREATE EVENT TRIGGER allow_whitelisted_extensions ON permissions_check
 (create_extension) EXECUTE PROCEDURE make_heroku_happy();
 CREATE EVENT TRIGGER replicate_at_subcommands ON subcommand_start
 (alter_table) EXECUTE PROCEDURE record_alter_table_subcommand();

 I would amend that syntax to allow for a WHEN expr much like in the
 DML trigger case, where the expression can play with the variables
 exposed at the time of calling the trigger.

  create event trigger prohibit_some_ddl
     preceding timing spec
          when tag in ('CREATE TABLE', 'ALTER TABLE')
       execute procedure throw_an_error();

We could do it that way, but the tag in ('CREATE TABLE', 'ALTER
TABLE') syntax will be tricky to evaluate.  I'd really prefer NOT to
need to start up and shut down the executor to figure out whether the
trigger has to fire.  If we are going to do it that way then we need
to carefully define what variables are available and what values they
get.  I think that most people's event-filtering needs will be simple
enough to be served by a more declarative syntax.

 We must also take some time to describe the timing specs carefully,
 their order of operation and which information are available for
 triggers in each of them. See below.

 I also don't like the way you squeeze in the drop cascade support here
 by re qualifying it as a timing spec, which it's clearly not.

 On subcommand_start, what is the tag set to? the main command or the
 subcommand? if the former, where do you find the current subcommand tag?
 I don't think subcommand_start should be a timing spec either.

That's up for grabs, but I was thinking the subcommand.  For example, consider:

alter table t alter column a add column b int, disable trigger all;

I would imagine this having a firing sequence something like this:

ddl_command_start:alter_table
ddl_command_permissions_check:alter_table
ddl_command_name_lookup:alter_table
alter_table_subcommand_prep:add_column
alter_table_subcommand_prep:disable_trigger_all
alter_table_subcommand_start:add_column
sql_create:column
alter_table_subcommand_end:add_column
alter_table_subcommand_start:disable_trigger_all
alter_table_subcommand_end:disable_trigger_all
ddl_command_end:alter_table

Lots of room for argument there, of course.

 Piece by piece:

 - I prefer EVENT to COMMAND, because the start or end or any other
 point during the execution of a command can be viewed as an event;
 however, it is pretty clear that not everything that can be viewed as
 an event upon which we might like triggers to fire can be viewed as a
 command, even if we have a somewhat well-defined notion of
 subcommands.  I continue to think that there is no sense in which
 creating a sequence to back a serial column or dropping an object due
 to DROP .. CASCADE is a subcommand; it is, however, pretty clearly an
 event.  Anything we want can be an event.  I think if we don't get

Re: [HACKERS] ECPG FETCH readahead

2012-03-29 Thread Noah Misch

On Thu, Mar 29, 2012 at 12:59:40PM +0200, Boszormenyi Zoltan wrote:
 2012-03-29 02:43 keltez?ssel, Noah Misch ?rta:
 On Sat, Mar 24, 2012 at 10:49:07AM +0100, Boszormenyi Zoltan wrote:

 +toliteralREADAHEAD number/literal. ExplicitliteralREADAHEAD 
 number/literal  or
 +literalNO READAHEAD/literal  turns cursor readahead on 
 (withliteralnumber/literal
 +as the size of the readahead window) or off for the specified cursor,
 +respectively. For cursors using a non-0 readahead window size is 256 
 rows,
 The number 256 is no longer present in your implementation.

 Indeed. Oversight, that part of the sentence is deleted.


 +the window size may be modified by setting 
 theliteralECPGFETCHSZ/literal
 +environment variable to a different value.
 I had in mind that DECLARE statements adorned with READAHEAD syntax would
 always behave precisely as written, independent of ECPGFETCHSZ or ecpg -R.
 Unadorned DECLARE statements would use ECPGFETCHSZ if set, then the value
 passed to ecpg -R if provided, and finally a value of 1 (no readahead) in
 the absence of both ECPGFETCHSZ and ecpg -R.  Did you do it differently for
 a particular reason?  I don't particularly object to what you've implemented,
 but I'd be curious to hear your thinking.

 What I had in mind was:

 NO READAHEAD == READAHEAD 0. This will translate into 1 tuple sized
 readahead window that cannot be modified by ECPGFETCHSZ.

 READAHEAD 1 also means uncached by default but ECPGFETCHSZ may
 modify the readahead window size.

To me, it feels too magical to make READAHEAD 1 and READAHEAD 0 differ only in
whether ECPGFETCHSZ can override.  I'm willing to go with whatever consensus
arises on this topic, though.

 This part is policy that can be debated and modified accordingly,
 it doesn't affect the internals of the caching code.

Agreed.

 +/para
 +
 +para
 +commandUPDATE/command  orcommandDELETE/command  with the
 +literalWHERE CURRENT OF/literal  clause, cursor readahead may or may 
 not
 +actually improve performance, ascommandMOVE/command  statement has 
 to be
 +sent to the backend before the DML statement to ensure correct cursor 
 position
 +in the backend.
 +/para
 This sentence seems to be missing a word near its beginning.

 Sounds like a corner case to me, but I am not a native english speaker.
 Which one sounds better:

UPDATE or DELETE with the WHERE CURRENT OF clause...

 or

UPDATE or DELETE statements with the WHERE CURRENT OF clause...
 ?

Here is the simplest change to make the original grammatical:

  Given an UPDATE or DELETE with the WHERE CURRENT OF clause, ...

I might write the whole thing this way:

  To execute an UPDATE or DELETE statement bearing the WHERE CURRENT OF
  clause, ECPG may first execute an implicit MOVE to synchronize the server
  cursor position with the local cursor position.  This can reduce or
  eliminate the performance benefit of readahead for affected cursors.

 one of the new sections about readahead should somehow reference the hazard
 around volatile functions.

 Done.

I don't see the mention in your latest patch.  You do mention it for the
sqlerrd[2] compatibility stuff.

 +varlistentry
 +termliteralOPEN RETURNS LAST ROW POSITION/literal/term
 +termliteralOPEN RETURNS 0 FOR LAST ROW POSITION/literal/term
 +listitem
 +para
 +  When the cursor is opened, it's possible to discover the size of 
 the result set
 +  usingcommandMOVE ALL/command  which traverses the result set 
 and move
 +  the cursor back to the beginning usingcommandMOVE ABSOLUTE 
 0/command.
 +  The size of the result set is returned in sqlca.sqlerrd[2].
 +/para
 +
 +para
 +  This slows down opening the cursor from the application point of 
 view
 +  but may also have side effects. If the cursor query contains 
 volatile function
 +  calls with side effects, they will be evaluated twice because of 
 traversing
 +  the result set this way duringcommandOPEN/command.
 +/para
 +
 +para
 +  The default is not to discover.
 +/para
 I mildly oppose having syntax to enable this per-cursor.  Readahead is a
 generally handy feature that I might use in new programs, but this feature is
 more of a hack for compatibility with some old Informix version.  For new
 code, authors should do their own MOVEs instead of using this option.

 The patch also adds an undocumented ECPGOPENRETURNSRESULTSETSIZE environment
 variable to control this.  I see no use for such a thing, because a program's
 dependency on sqlerrd[2] is fixed at build time.  If a program does not
 reference sqlerrd[2] for a cursor, there's no benefit from choosing at 
 runtime
 to populate that value anyway.  If a program does reference it, any option
 needed to have it set correctly had better be set at build time and apply to
 every run of the final program.

 IOW, I suggest having only the ecpg-time option to enable this behavior.
 Thoughts?

 I wanted to make it available

Re: [HACKERS] Potential reference miscounts and segfaults in plpython.c

2012-03-29 Thread Daniele Varrazzo

On Thu, Feb 23, 2012 at 8:35 PM, Daniele Varrazzo
daniele.varra...@gmail.com wrote:
 On Mon, Feb 20, 2012 at 12:09 AM, Jan Urbański wulc...@wulczer.org wrote:

 BTW, that tool is quite handy, I'll have to try running it over psycopg2.

 Indeed. I'm having a play with it. It is reporting several issues to
 clean up (mostly on failure at module import). It's also tracebacking
 here and there: I'll send the author some feedback/patches.

 I'm patching psycopg in the gcc-python-plugin branch in my dev repos
 (https://github.com/dvarrazzo/psycopg).

The just released psycopg 2.4.5 has had its codebase cleaned up using
the gcc static checker.

I haven't managed to run it without false positives, however it's been
very useful to find missing return value checks and other nuances. No
live memory leak has been found: there were a few missing decrefs but
only at module init, not in the live code.

Full report about the changes in this message:
https://fedorahosted.org/pipermail/gcc-python-plugin/2012-March/000229.html.

Cheers,

-- Daniele

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers patch v18

Apropos of nothing and since I haven't found a particularly good time
to say this in amidst all the technical discussion, I appreciate very
much all the work you've been putting into this.

On Thu, Mar 29, 2012 at 12:42 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 serious issues in the discussion we've had so far, especially (1) the
 points at which figures are fired aren't consistent between commands,
 (2) not much thought has been given to what happens if, say, a DDL
 trigger performs a DDL operation on the table the outer DDL command is
 due to modify, and (3) we are eventually going to want to trap a much
 richer set of events than can be captured by the words before and
 after.  Now, you could view this as me throwing up roadblocks to
 validate my previously-expressed opinion that this wasn't going to get
 done, but I really, honestly believe that these are important issues
 and that getting them right is more important than getting something
 done now.

 (1) is not hard to fix as soon as we set a policy, which the most
    pressing thing we need to do in my mind, whatever we manage to
    commit in 9.2.

I agree that setting a policy is enormously important, not so sure
about how easy it is to fix, but we shall see.

My primary and overriding goal here is to make sure that we don't make
any design decisions, in the syntax or otherwise, that foreclose the
ability to add policies that we've not yet dreamed of to future
versions of the code.  It seems vanishingly unlikely that the first
commit will cover all the use cases that people care about, and only
slightly more likely that we'll even be able to list them all at that
point.  Perhaps I'm wrong and we already know what they are, but I'd
like to bet against any of us - and certainly me - being that smart.

In terms of policies, there are two that seems to me to be very clear:
at the very beginning of ProcessUtility, and at the very end of it,
*excluding* recursive calls to ProcessUtility that are intended to
handle subcommands.  I think we could call the first start or
command_start or ddl_command_start or post-parse or pre-locking or
something along those lines, and the second end or command_end or
ddl_command_end.  I think it might be worth including ddl in there
because the scope of the current patch really is just DDL (as opposed
to say DCL) and having a separate set of firing points for DCL does
not seem dumb.  If we ever try to do anything with transaction control
as you suggested upthread that's likely to also require special
handling.  Calling it, specifically, ddl_command_start makes the scope
very clear.

 (2) can be addressed with a simple blacklist that would be set just
    before calling user defined code, and cleaned when done running it.
    When in place the blacklist lookup is easy to implement in a central
    place (utility.c or cmdtrigger.c) and ereport() when the current
    command is in the blacklist.

    e.g. alter table would blacklist alter table and drop table commands
    on the current object id

Here we're talking about a firing point that we might call
ddl_post_name_lookup or something along those lines.  I would prefer
to handle this by making the following rules - here's I'm assuming
that we're talking about the case where the object in question is a
relation:

1. The trigger fires immediately after RangeVarGetRelidExtended and
before any other checks are performed, and especially before any
CheckTableNotInUse(). This last is important, because what matters is
whether the table's not in use AFTER all possible user-supplied code
is executed.  If the trigger opens or closes cursors, for example,
what matters is whether there are any cursors left open *after* the
triggers complete, not whether there were any open on entering the
trigger.  The same is true of any other integrity check: if there's a
chance that the trigger could change something that affects whether
the integrity constraint gets fired, then we'd better be darn sure to
fire the trigger before we check the integrity constraint.

2. If we fire any triggers at this point, we CommandCounterIncrement()
and then re-do RangeVarGetRelidExtended() with the same arguments we
passed before.  If it doesn't return the same OID we got the first
time, we abort with some kind of serialization error.  We need to be a
little careful about the phrasing of the error message, because it's
possible for this to change during command execution even without
command triggers if somebody creates a table earlier in the search
path with the same unqualified name that was passed to the command,
but I think it's OK for the existence of a command-trigger to cause a
spurious abort in that very narrow case, as long as the error message
includes some appropriate weasel language.  Alternatively, we could
try to avoid that corner case by rechecking only that a tuple with the
right OID is still present and still has the correct relkind, but that
seems like it might be a little less

Re: [HACKERS] Finer Extension dependencies

2012-03-29 Thread David E. Wheeler

On Mar 29, 2012, at 4:42 AM, Robert Haas wrote:

 2. Add a new feature to the provides line with every release that does
 anything other than fix bugs, leading to:
 
 provides = foobar-1.1, foobar-1.2, foobar-1.3, foobar-1.4, foobar-1.5,
 foobar-1.6, foobar-2.0, foobar-2.1, foobar-2.2, foobar-2.3,
 foobar-3.0, foobar-3.1

This is what I have expected to do. In new releases of pgTAP, I’d probably just 
add version lines. I might give certain releases names, but probably not. I’m 
too lazy, and if a given release has more than one new feature, it’d be a bit 
silly.

I’ve never been very keen on this approach, but then I don’t understand 
packaging systems very well, so it might rock, and I just don’t know how to use 
it properly. But I cannot tell.

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

I wrote:
 Hm ... I just had a different idea.  I need to go look at the code
 again, but I believe that in the problematic cases, the post-analyze
 hook does not compute a queryId for the optimizable statement.  This
 means that it will arrive at the executor with queryId zero.  What if
 we simply made the executor hooks do nothing when queryId is zero?
 (Note that this also means that in the problematic cases, the behavior
 is already pretty wrong because executor costs for *all* statements of
 this sort are getting merged into one hashtable entry for hash zero.)

The attached proposed patch does it that way.  It makes the EXPLAIN,
SELECT INTO, and DECLARE CURSOR cases behave as expected for utility
statements.  PREPARE/EXECUTE work a bit funny though: if you have
track = all then you get EXECUTE cycles reported against both the
EXECUTE statement and the underlying PREPARE.  This is because when
PREPARE calls parse_analyze_varparams the post-analyze hook doesn't know
that this isn't a top-level statement, so it marks the query with a
queryId.  I don't see any way around that part without something like
what I suggested before.  However, this behavior seems to me to be
considerably less of a POLA violation than the cases involving two
identical-looking entries for self-contained statements, and it might
even be thought to be a feature not a bug (since the PREPARE entry will
accumulate totals for all uses of the prepared statement).  So I'm
satisfied with it for now.

regards, tom lane

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 137b242..bebfff1 100644
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
*** pgss_post_parse_analyze(ParseState *psta
*** 602,610 
  	if (!pgss || !pgss_hash)
  		return;
  
! 	/* We do nothing with utility statements at this stage */
  	if (query-utilityStmt)
  		return;
  
  	/* Set up workspace for query jumbling */
  	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
--- 602,620 
  	if (!pgss || !pgss_hash)
  		return;
  
! 	/*
! 	 * Utility statements get queryId zero.  We do this even in cases where
! 	 * the statement contains an optimizable statement for which a queryId
! 	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
! 	 * runtime control will first go through ProcessUtility and then the
! 	 * executor, and we don't want the executor hooks to do anything, since
! 	 * we are already measuring the statement's costs at the utility level.
! 	 */
  	if (query-utilityStmt)
+ 	{
+ 		query-queryId = 0;
  		return;
+ 	}
  
  	/* Set up workspace for query jumbling */
  	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
*** pgss_post_parse_analyze(ParseState *psta
*** 619,624 
--- 629,641 
  	query-queryId = hash_any(jstate.jumble, jstate.jumble_len);
  
  	/*
+ 	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+ 	 * prevent confusion with the utility-statement case.
+ 	 */
+ 	if (query-queryId == 0)
+ 		query-queryId = 1;
+ 
+ 	/*
  	 * If we were able to identify any ignorable constants, we immediately
  	 * create a hash table entry for the query, so that we can record the
  	 * normalized form of the query string.  If there were no such constants,
*** pgss_ExecutorStart(QueryDesc *queryDesc,
*** 649,655 
  	else
  		standard_ExecutorStart(queryDesc, eflags);
  
! 	if (pgss_enabled())
  	{
  		/*
  		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
--- 666,677 
  	else
  		standard_ExecutorStart(queryDesc, eflags);
  
! 	/*
! 	 * If query has queryId zero, don't track it.  This prevents double
! 	 * counting of optimizable statements that are directly contained in
! 	 * utility statements.
! 	 */
! 	if (pgss_enabled()  queryDesc-plannedstmt-queryId != 0)
  	{
  		/*
  		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
*** pgss_ExecutorFinish(QueryDesc *queryDesc
*** 719,731 
  static void
  pgss_ExecutorEnd(QueryDesc *queryDesc)
  {
! 	if (queryDesc-totaltime  pgss_enabled())
! 	{
! 		uint32		queryId;
! 
! 		/* Query's ID should have been filled in by post-analyze hook */
! 		queryId = queryDesc-plannedstmt-queryId;
  
  		/*
  		 * Make sure stats accumulation is done.  (Note: it's okay if several
  		 * levels of hook all do this.)
--- 741,750 
  static void
  pgss_ExecutorEnd(QueryDesc *queryDesc)
  {
! 	uint32		queryId = queryDesc-plannedstmt-queryId;
  
+ 	if (queryId != 0  queryDesc-totaltime  pgss_enabled())
+ 	{
  		/*
  		 * Make sure stats accumulation is done.  (Note: it's okay if several
  		 * levels of hook all do this.)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

On Thu, Mar 29, 2012 at 1:48 PM, David E. Wheeler da...@justatheory.com wrote:
 On Mar 29, 2012, at 4:42 AM, Robert Haas wrote:

 2. Add a new feature to the provides line with every release that does
 anything other than fix bugs, leading to:

 provides = foobar-1.1, foobar-1.2, foobar-1.3, foobar-1.4, foobar-1.5,
 foobar-1.6, foobar-2.0, foobar-2.1, foobar-2.2, foobar-2.3,
 foobar-3.0, foobar-3.1

 This is what I have expected to do. In new releases of pgTAP, I’d probably 
 just add version lines. I might give certain releases names, but probably 
 not. I’m too lazy, and if a given release has more than one new feature, it’d 
 be a bit silly.

 I’ve never been very keen on this approach, but then I don’t understand 
 packaging systems very well, so it might rock, and I just don’t know how to 
 use it properly. But I cannot tell.

So the idea is that you're actually supposed to separately catalog
each feature you added (e.g. each new function), so that people can
depend specifically on those features.  Then if you remove the
function again in some distant future, you stop advertising that
feature (but you can still advertise any other features you added in
the same release).  If you're not going to do that, then this feature
as proposed is strictly worse than figuring out a way to compare
version numbers, because it's more work, some people will not bother
to update the provides line, and other people will sometimes forget
it.

I don't really have the foggiest idea how people using other packaging
systems handle this.  It seems like it would be a huge pain in the
rear end to be continually adding Provides: lines to RPMs for every
new feature that a new version of a package offers, not to mention
that you'd hardly want the corresponding Requires: lines to have to
enumerate all the public interfaces those packages used just in case
one of them ever went away.  I have a feeling I'm missing part of the
picture here, somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

Robert Haas robertmh...@gmail.com writes:
 provides = foobar-1.1, foobar-1.2, foobar-1.3, foobar-1.4, foobar-1.5,
 foobar-1.6, foobar-2.0, foobar-2.1, foobar-2.2, foobar-2.3,
 foobar-3.0, foobar-3.1

 This is what I have expected to do. In new releases of pgTAP, I’d probably
 just add version lines. I might give certain releases names, but probably
 not. I’m too lazy, and if a given release has more than one new feature,
 it’d be a bit silly.

 I’ve never been very keen on this approach, but then I don’t understand
 packaging systems very well, so it might rock, and I just don’t know how
 to use it properly. But I cannot tell.

 So the idea is that you're actually supposed to separately catalog
 each feature you added (e.g. each new function), so that people can
 depend specifically on those features.  Then if you remove the
 function again in some distant future, you stop advertising that
 feature (but you can still advertise any other features you added in
 the same release).  If you're not going to do that, then this feature
 as proposed is strictly worse than figuring out a way to compare
 version numbers, because it's more work, some people will not bother
 to update the provides line, and other people will sometimes forget
 it.

 I don't really have the foggiest idea how people using other packaging
 systems handle this.  It seems like it would be a huge pain in the
 rear end to be continually adding Provides: lines to RPMs for every
 new feature that a new version of a package offers, not to mention
 that you'd hardly want the corresponding Requires: lines to have to
 enumerate all the public interfaces those packages used just in case
 one of them ever went away.  I have a feeling I'm missing part of the
 picture here, somehow.

Basically those examples are far too fine grained. Let's get back to the
example of ip4r that I know better. I would imagine those provides lines
in there:

  provides = ip4, ip4r
  provides = ip4, ip4r, ip4r_gist, ip4r_gin

Then when adding ipv6 support and datatypes:

  provides = ip4, ip4r, ip6, ip6r
  provides = ip4, ip4r, ip6, ip6r, ip4r_gist, ip4r_gin, ip6r_gist, ip6r_gin

Pick any one as what I would consider a realistic example.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

Robert Haas robertmh...@gmail.com writes:
 So the idea is that you're actually supposed to separately catalog
 each feature you added (e.g. each new function), so that people can
 depend specifically on those features.

 I don't really have the foggiest idea how people using other packaging
 systems handle this.  It seems like it would be a huge pain in the
 rear end to be continually adding Provides: lines to RPMs for every
 new feature that a new version of a package offers, not to mention
 that you'd hardly want the corresponding Requires: lines to have to
 enumerate all the public interfaces those packages used just in case
 one of them ever went away.  I have a feeling I'm missing part of the
 picture here, somehow.

Yeah.  AFAIK, nobody actually does that.  In my experience with Red Hat
packages, so-called virtual Provides (which are exactly equivalent to
this proposed feature) are used only for cases where there is or is
planned to be more than one package that can supply a given API, and the
Provides is really more of a logical package name than an identifier of
a feature as such.  When people want to depend on a feature that was
added after initial release of a package, they invariably use
versioned dependencies like Requires: foobar = nnn.  And it's also
pretty common to use such a dependency because you need to have a bug
fixed that was fixed in version nnn; so assuming that you only need
feature names for, er, features may be a mistake too.

So if you look at common practice, this whole idea is wrong and we ought
to define a way to compare version numbers instead.  I'm not entirely
sure that I want to go there yet, but it certainly bears considering as
a time-tested alternative design.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

On ons, 2012-03-28 at 23:00 -0700, Hitoshi Harada wrote:
 I totally agree with Robert's point that one feature is not
 standardized and nobody can tell how you can depend on the feature in
 the end.  Mind you, I've never heard about building dependency by its
 name as a string in other packaging system.  If you want to introduce
 the concept of version dependency not feature name dependency, do
 *it*;  I don't think feature dependency solves it.

The Python setuptools (a.k.a. distutils a.k.a. distribute a.k.a. eggs
a.k.a. easy_install a.k.a. dont-get-me-started) system supports feature
names that a package can provide, but it uses them for a different
purpose.  The idea is that a package foo can depend on a package
bar[somethingextra], and then bar itself would declare it's
dependencies such that it depends, say, on ham, but if feature
somethingextra is required, it also depends on eggs.

This is actually quite useful, but it breaks down when you, say, want to
wrap your egg into a Debian package.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

On tor, 2012-03-29 at 09:51 +0200, Dimitri Fontaine wrote:
 I don't want to introduce version dependency, because I don't think we
 need it. If you want to compare what we're doing here with say debian
 packaging, then look at how they package libraries. The major version
 number is now part of the package name and you depend on that
 directly.
 
 So let's take the shortcut to directly depend on the “feature” name.
 
 For a PostgreSQL extension example, we could pick ip4r. That will soon
 include support for ipv6 (it's already done code wise, missing docs
 update). If you want to use ip4r for storing ipv6, you will simply
 require “ip6r” or whatever feature name is provided by the extension
 including it.

Note that in Debian, virtual package names (the ones you give in the
Provides line) are effectively centrally managed.  Most are specified by
the policy, the rest are managed between the affected packages.  This
rests on the assumption that very little outside packaging that deviates
from the official Debian packaging goes on.

Since we don't have any central coordinator or authority of that kind,
we need to design the system differently.

At the very least, I would suggest that feature names are per-extension.
So in the above example you could depend on something like ipv4[ipv6].



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG FETCH readahead

2012-03-29 Thread Michael Meskes

On Thu, Mar 29, 2012 at 01:03:41PM -0400, Noah Misch wrote:
 Still, we're looking at dedicated ECPG syntax, quite visible even to folks
 with no interest in Informix.  We have eschewed littering our syntax with
 compatibility aids, and I like it that way.  IMO, an option to the ecpg
 preprocessor is an acceptable level of muddle to provide this aid.  A DECLARE
 CURSOR syntax extension goes too far.

+1 from me.

Let's not add special ecpg sql syntax if we don't absolutely have to.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

On Thu, Mar 29, 2012 at 2:28 PM, Peter Eisentraut pete...@gmx.net wrote:
 On ons, 2012-03-28 at 23:00 -0700, Hitoshi Harada wrote:
 I totally agree with Robert's point that one feature is not
 standardized and nobody can tell how you can depend on the feature in
 the end.  Mind you, I've never heard about building dependency by its
 name as a string in other packaging system.  If you want to introduce
 the concept of version dependency not feature name dependency, do
 *it*;  I don't think feature dependency solves it.

 The Python setuptools (a.k.a. distutils a.k.a. distribute a.k.a. eggs
 a.k.a. easy_install a.k.a. dont-get-me-started) system supports feature
 names that a package can provide, but it uses them for a different
 purpose.  The idea is that a package foo can depend on a package
 bar[somethingextra], and then bar itself would declare it's
 dependencies such that it depends, say, on ham, but if feature
 somethingextra is required, it also depends on eggs.

 This is actually quite useful,

Wow.  That is complex, but I agree that it's useful.

 but it breaks down when you, say, want to
 wrap your egg into a Debian package.

*blink* Huh?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

Peter Eisentraut pete...@gmx.net writes:
 At the very least, I would suggest that feature names are per-extension.

Yeah, I had about come to that conclusion too.  A global namespace for
them would be a mistake given lack of central coordination.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

On Thu, Mar 29, 2012 at 2:25 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah.  AFAIK, nobody actually does that.  In my experience with Red Hat
 packages, so-called virtual Provides (which are exactly equivalent to
 this proposed feature) are used only for cases where there is or is
 planned to be more than one package that can supply a given API, and the
 Provides is really more of a logical package name than an identifier of
 a feature as such.

Note that this feature as designed will NOT service that use-case, as
discussed upthread, and in fact it would probably need a full rewrite
to do so, because the principle of operation is based on the
dependency mechanism, which doesn't support this.  In fact, ALTER
EXTENSION .. UPDATE as implemented in this patch is already doing some
clever tomfoolery to make things work as the user will expect - when
you update an extension, it removes the dependencies between the
extension and its extension features, then drops any extension
features that are no longer needed, then puts back the dependencies.
This produces pretty logical error messages, but I don't think it
generalizes to any of the other things we might want to do (e.g.
version number dependencies) so while we could commit this patch the
way it is now I think doing anything more complex will require a
different approach.

 When people want to depend on a feature that was
 added after initial release of a package, they invariably use
 versioned dependencies like Requires: foobar = nnn.  And it's also
 pretty common to use such a dependency because you need to have a bug
 fixed that was fixed in version nnn; so assuming that you only need
 feature names for, er, features may be a mistake too.

Hmm, interesting.

 So if you look at common practice, this whole idea is wrong and we ought
 to define a way to compare version numbers instead.  I'm not entirely
 sure that I want to go there yet, but it certainly bears considering as
 a time-tested alternative design.

I think that's definitely worth considering.  One idea would be to
mandate that you can only use the version-dependencies feature if both
versions are of some specific form (e.g. two or three integers
separated by periods).  If you try to depend on foo = 1.b or if you
depend on foo = 1.3 but the installed version is 1.b, it errors out.

Frankly, I'm not sure we bet on the right horse in not mandating a
version numbering scheme from the beginning.  But given that we
didn't, we probably don't want to get too forceful about it too
quickly.  However, we could ease into it by documenting a recommended
numbering scheme and making features like version-dependencies work
only when that scheme is used.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

On tor, 2012-03-29 at 14:39 -0400, Robert Haas wrote:
  but it breaks down when you, say, want to
  wrap your egg into a Debian package.
 
 *blink* Huh?

Well, you can't represent that mechanism in a Debian (or RPM) package
dependency.  So the alternatives are make it a Recommends and add a
free-form explanation somewhere, or make it a hard Depends, thus
overriding the fine-grained dependency setup.

More to the point, I think any mechanism we dream up here that is
smarter than what dpkg or rpm can represent is going to be useful only
in niche situation.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

2012-03-29 Thread David E. Wheeler

On Mar 29, 2012, at 11:48 AM, Robert Haas wrote:

 Frankly, I'm not sure we bet on the right horse in not mandating a
 version numbering scheme from the beginning.  But given that we
 didn't, we probably don't want to get too forceful about it too
 quickly.  However, we could ease into it by documenting a recommended
 numbering scheme and making features like version-dependencies work
 only when that scheme is used.

PGXN mandates semantic versions for this reason (currently v1.0.0):

  http://semver.org/spec/v1.0.0.html

Removes all the ambiguity.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

Tom Lane t...@sss.pgh.pa.us writes:
 Peter Eisentraut pete...@gmx.net writes:
 At the very least, I would suggest that feature names are per-extension.

 Yeah, I had about come to that conclusion too.  A global namespace for
 them would be a mistake given lack of central coordination.

That's how I did it first, but Alvaro opposed to that because it allows
for more than one extension to provide for the same feature name.

  http://archives.postgresql.org/pgsql-hackers/2012-03/msg01425.php

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 Peter Eisentraut pete...@gmx.net writes:
 At the very least, I would suggest that feature names are per-extension.

 Yeah, I had about come to that conclusion too.  A global namespace for
 them would be a mistake given lack of central coordination.

 That's how I did it first, but Alvaro opposed to that because it allows
 for more than one extension to provide for the same feature name.
   http://archives.postgresql.org/pgsql-hackers/2012-03/msg01425.php

Right, but the question that has to be considered is how often would
that be intentional as opposed to an undesirable name collision.
I think Hitoshi was right upthread that it will seldom if ever be
the case that somebody is independently reimplementing somebody
else's API, so the use-case for intentional substitution seems thin.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG FETCH readahead

2012-03-29 Thread Boszormenyi Zoltan


2012-03-29 20:34 keltezéssel, Michael Meskes írta:

On Thu, Mar 29, 2012 at 01:03:41PM -0400, Noah Misch wrote:

Still, we're looking at dedicated ECPG syntax, quite visible even to folks
with no interest in Informix.  We have eschewed littering our syntax with
compatibility aids, and I like it that way.  IMO, an option to the ecpg
preprocessor is an acceptable level of muddle to provide this aid.  A DECLARE
CURSOR syntax extension goes too far.

+1 from me.

Let's not add special ecpg sql syntax if we don't absolutely have to.

Michael


This is what I waited for, finally another opinion. I will delete this
from the grammar.

What about the other question about the 0 vs 1 distinction?

Without the second patch, 0 drives the cursor with the old
ECPGdo() code. All other values drive the cursor via the new
ECPGopen() et.al. Should we keep this behaviour? In this case
the 0 vs 1 distinction is not needed in add_cursor() as the 0
value doesn't even reach ECPGopen().

But if we consider including the second patch as well, should we
keep the NO READAHEAD option in the grammar and in cursor.c?
After solving the problem with WHERE CURRENT OF, this and
the 0-vs-1 distinction starts to feel pointless.

And what about the override via the environment variable?
Should it work only upwards?

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)

On Thu, Mar 29, 2012 at 8:04 AM, Andrew Dunstan and...@dunslane.net wrote:
 1. I've been in discussion with some people about adding simple JSON extract
 functions. We already have some (i.e. xpath()) for XML.

 2. You might find htsql http://htsql.org/ interesting.

My colleagues and myself have thought about this quite a lot.  Even
without changing any of FEBE's semantics (so  far, I think they're
basically fine, with the glaring problem of hidden session state) I
think it may be a good time to experiment in supporting SPDY (clearly,
9.3+), which solves some of the problems of HTTP that make it pretty
unusable for interactive database workload.

For a long time, it looked like SPDY would become the IETF's HTTP 2.0.
 But the future is still clouded, as just recently Microsoft
introduced their own mostly-compatible design.  So that's a bit scary.

More concretely, here's what I really want.  The first two concerns
are almost entirely technical, the latter two social, in that they
rely on notion of common or standard':

* A standard frame extension format.  For example, last I checked
Postgres-XC, it required snapshot information to be passed, and it'd
be nice that instead of having to hack the protocol that they could
attach an X-Snapshot-Info header, or whatever. This could also be a
nice way to pass out-of-band hint information of all sorts.

* Something similar to the HTTP Host header, so that one can route
to databases without having to conflate database identity with the
actual port being connected to.  Yes, theoretically it can be done
with weird startup packet gyrations, but that is firmly in the weird
category.

* HTTP -- and *probably* its hypothetical progeny -- are more common
than FEBE packets, and a lot of incidental complexity of writing
routers is reduced by the commonality of routing HTTP traffic.

* Protocol compression -- but a bit of sand in the gears is *which*
compression -- for database workloads, the performance of zlib can be
a meaningful bottleneck.

Lastly, a case that can not as easily be fixed without some more
thinking is leveraging caching semantics of HTTP.   think people would
really, really like that, if they could get away from having to
hand-roll their own cache regeneration in common cases.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)

On Thu, Mar 29, 2012 at 12:57 PM, Daniel Farina dan...@heroku.com wrote:
D'oh, I munged the order.

More technical concerns:
 * Protocol compression -- but a bit of sand in the gears is *which*
 compression -- for database workloads, the performance of zlib can be
 a meaningful bottleneck.

 * Something similar to the HTTP Host header, so that one can route
 to databases without having to conflate database identity with the
 actual port being connected to.  Yes, theoretically it can be done
 with weird startup packet gyrations, but that is firmly in the weird
 category.

Socialish (but no less important):

 * A standard frame extension format.  For example, last I checked
 Postgres-XC, it required snapshot information to be passed, and it'd
 be nice that instead of having to hack the protocol that they could
 attach an X-Snapshot-Info header, or whatever. This could also be a
 nice way to pass out-of-band hint information of all sorts.


 * HTTP -- and *probably* its hypothetical progeny -- are more common
 than FEBE packets, and a lot of incidental complexity of writing
 routers is reduced by the commonality of routing HTTP traffic.


-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

I wrote:
 ... PREPARE/EXECUTE work a bit funny though: if you have
 track = all then you get EXECUTE cycles reported against both the
 EXECUTE statement and the underlying PREPARE.  This is because when
 PREPARE calls parse_analyze_varparams the post-analyze hook doesn't know
 that this isn't a top-level statement, so it marks the query with a
 queryId.  I don't see any way around that part without something like
 what I suggested before.  However, this behavior seems to me to be
 considerably less of a POLA violation than the cases involving two
 identical-looking entries for self-contained statements, and it might
 even be thought to be a feature not a bug (since the PREPARE entry will
 accumulate totals for all uses of the prepared statement).  So I'm
 satisfied with it for now.

Actually, there's an easy hack for that too: we can teach the
ProcessUtility hook to do nothing (and in particular not increment the
nesting level) when the statement is an ExecuteStmt.  This will result
in the executor time being blamed on the original PREPARE, whether or
not you have enabled tracking of nested statements.  That seems like a
substantial win to me, because right now you get a distinct EXECUTE
entry for each textually-different set of parameter values, which seems
pretty useless.  This change would make use of PREPARE/EXECUTE behave
very nearly the same in pg_stat_statement as use of protocol-level
prepared statements.  About the only downside I can see is that the
cycles expended on evaluating the EXECUTE's parameters will not be
charged to any pg_stat_statement entry.  Since those can be expressions,
in principle this might be a non-negligible amount of execution time,
but in practice it hardly seems likely that anyone would care about it.

Barring objections I'll go fix this, and then this patch can be
considered closed except for possible future tweaking of the
sticky-entry decay rule.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

On Thu, Mar 29, 2012 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 ... PREPARE/EXECUTE work a bit funny though: if you have
 track = all then you get EXECUTE cycles reported against both the
 EXECUTE statement and the underlying PREPARE.  This is because when
 PREPARE calls parse_analyze_varparams the post-analyze hook doesn't know
 that this isn't a top-level statement, so it marks the query with a
 queryId.  I don't see any way around that part without something like
 what I suggested before.  However, this behavior seems to me to be
 considerably less of a POLA violation than the cases involving two
 identical-looking entries for self-contained statements, and it might
 even be thought to be a feature not a bug (since the PREPARE entry will
 accumulate totals for all uses of the prepared statement).  So I'm
 satisfied with it for now.

 Actually, there's an easy hack for that too: we can teach the
 ProcessUtility hook to do nothing (and in particular not increment the
 nesting level) when the statement is an ExecuteStmt.  This will result
 in the executor time being blamed on the original PREPARE, whether or
 not you have enabled tracking of nested statements.  That seems like a
 substantial win to me, because right now you get a distinct EXECUTE
 entry for each textually-different set of parameter values, which seems
 pretty useless.  This change would make use of PREPARE/EXECUTE behave
 very nearly the same in pg_stat_statement as use of protocol-level
 prepared statements.  About the only downside I can see is that the
 cycles expended on evaluating the EXECUTE's parameters will not be
 charged to any pg_stat_statement entry.  Since those can be expressions,
 in principle this might be a non-negligible amount of execution time,
 but in practice it hardly seems likely that anyone would care about it.

 Barring objections I'll go fix this, and then this patch can be
 considered closed except for possible future tweaking of the
 sticky-entry decay rule.

After reading your last commit message, I was wondering if something
like this might be possible, so +1 from me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] query cache

On Fri, Mar 23, 2012 at 2:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 What I think is more common is the repeated submission of queries that
 are *nearly* identical, but with either different parameter bindings
 or different constants.  It would be nice to have some kind of cache
 that would allow us to avoid the overhead of parsing and planning
 nearly identical statements over and over again, but the trick is that
 you have to fingerprint the query to notice that's happening in the
 first place, and the fingerprinting has to cost less than what the
 cache saves you.  I don't know whether that's possible, but I suspect
 it's far from easy.

 The traditional solution to this is to make the application do it, ie,
 parameterized prepared statements.  Since that has direct benefits to
 the application as well, in that it can use out-of-line values and
 thereby avoid quoting and SQL-injection risks, it's not apparent that
 it's a good idea to expend lots of sweat to reverse-engineer
 parameterization from a collection of unparameterized queries.

But there are several disadvantages to unparameterized queries, too.
One, it requires more messages at the protocol level, which is not
free.  Two, whatever abstraction layer you're using to submit queries
to the database has to support it, and not all of them do.  And three,
you get worse query plans if lack of knowledge about the specific
query parameters proves to be essential.  I know you've done some work
on this last point for 9.2, but I'm fuzzy on the details and on how
much benefit we'll get out of it in real-world cases.

Interestingly, Peter Geoghegan's blog post on the pg_stat_statements
patch you just committed[1] claims that the overhead of fingerprinting
queries was only 1-2.5%, which is less than I would have thought, so
if we ever get to the point where we're fairly sure we've got problem
three licked, it might make sense to revisit this due to problems one
and two.  It's also probably worth keeping in mind the next time we
bump the protocol version: it would be nice to have a way of doing
prepare-bind-execute in a single protocol message, which I believe to
be not possible at present.

[1] 
http://pgeoghegan.blogspot.com/2012/03/much-improved-statement-statistics.html

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers patch v18

Robert Haas robertmh...@gmail.com writes:
 Well, preceding and before are synonyms, so I don't see any advantage
 in that change.  But I really did mean AT permissions_checking time,
 not before or after it.  That is, we'd have a hook where instead of
 doing something like this:

 aclresult = pg_class_aclcheck(reloid, GetUserId(), ACL_SELECT);

 ...we'd call a superuser-supplied trigger function and let it make the

Wow. I don't think I like that at all. It's indeed powerful, but how do
you go explaining and fixing unanticipated behavior with such things in
production? It looks too much like an invitation to break a very careful
design where each facility has to rove itself to get in.

So yes, that's not at all what I was envisioning, I still think that
most event specs for event triggers are going to have to happen in
between our different internal implementation of given facility.

Maybe we should even use BEFORE in cases I had in mind and AT for cases
like the one you're picturing?

 CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...])
 EXECUTE PROCEDURE function_name(args);

 I would amend that syntax to allow for a WHEN expr much like in the
 DML trigger case, where the expression can play with the variables
 exposed at the time of calling the trigger.

  create event trigger prohibit_some_ddl
     preceding timing spec
          when tag in ('CREATE TABLE', 'ALTER TABLE')
       execute procedure throw_an_error();

 We could do it that way, but the tag in ('CREATE TABLE', 'ALTER
 TABLE') syntax will be tricky to evaluate.  I'd really prefer NOT to
 need to start up and shut down the executor to figure out whether the
 trigger has to fire.  If we are going to do it that way then we need
 to carefully define what variables are available and what values they
 get.  I think that most people's event-filtering needs will be simple
 enough to be served by a more declarative syntax.

We could then maybe restrict that idea so that the syntax is more like

  when trigger variable in (literal[, ...])

So that we just have to store the array of strings and support only one
operation there. We might want to go as far as special casing the object
id to store an oid vector there rather than a text array, but we could
also decide not to support per-oid command triggers in the first
release and remove that from the list of accepted trigger variables.

 I also don't like the way you squeeze in the drop cascade support here
 by re qualifying it as a timing spec, which it's clearly not.

 On subcommand_start, what is the tag set to? the main command or the
 subcommand? if the former, where do you find the current subcommand tag?
 I don't think subcommand_start should be a timing spec either.

 That's up for grabs, but I was thinking the subcommand.  For example, 
 consider:

 alter table t alter column a add column b int, disable trigger all;

 I would imagine this having a firing sequence something like this:

 ddl_command_start:alter_table
 ddl_command_permissions_check:alter_table
 ddl_command_name_lookup:alter_table
 alter_table_subcommand_prep:add_column
 alter_table_subcommand_prep:disable_trigger_all
 alter_table_subcommand_start:add_column
 sql_create:column
 alter_table_subcommand_end:add_column
 alter_table_subcommand_start:disable_trigger_all
 alter_table_subcommand_end:disable_trigger_all
 ddl_command_end:alter_table

 Lots of room for argument there, of course.

Yeah well, in my mind the alter table actions are not subcommands yet
because they don't go back to standard_ProcessUtility(). The commands in
CREATE SCHEMA do that, same for create table foo(id serial primary key)
and its sequence and index.

We could maybe add another variable called context in the command
trigger procedure API that would generally be NULL and would be set to
the main command tag if we're running a subcommand.

We could still support alter table commands as sub commands as far as
the event triggers are concerned, and document that you don't get a
parse tree there even when implementing the trigger in C.

 Yeah, I would not try to pass the parse tree to every event trigger,
 just the ones where it's well-defined.  For drops, I would just say,
 hey, this is what we dropped.  The user can log it or ereport, but
 that's about it.  Still, that's clearly enough to do a lot of really
 interesting stuff, replication being the most obvious example.

Yes the use case is important enough to want to support it. Now, to
square it into the design. I think it should fire as a “normal” event
trigger, with the context set to e.g. DROP TYPE, the tag set to this
object classid (e.g. DROP FUNCTION).

It's easy enough to implement given that CreateCommandTag() already
switch (((DropStmt *) parsetree)-removeType), we just need to move that
code in another function and reuse it at the right places. It's only
missing DROP COLUMN apparently, I would have to add that.

So you would know that the trigger is firing for a DROP FUNCTION that

Re: [HACKERS] query cache

Robert Haas robertmh...@gmail.com writes:
 Interestingly, Peter Geoghegan's blog post on the pg_stat_statements
 patch you just committed[1] claims that the overhead of fingerprinting
 queries was only 1-2.5%, which is less than I would have thought, so
 if we ever get to the point where we're fairly sure we've got problem
 three licked, it might make sense to revisit this due to problems one
 and two.

Maybe.

 It's also probably worth keeping in mind the next time we
 bump the protocol version: it would be nice to have a way of doing
 prepare-bind-execute in a single protocol message, which I believe to
 be not possible at present.

Huh?  That's the standard way of doing it, actually.  You send
Prepare/Bind/Execute/Sync in one packet, and wait for results.

This requires that you understand the query well enough to perform
Bind without seeing a Describe result, but that seems essential
to any one-round-trip case anyway.  It's not the protocol design
that is holding back anybody who wants to do that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers patch v18

Robert Haas robertmh...@gmail.com writes:
 Apropos of nothing and since I haven't found a particularly good time
 to say this in amidst all the technical discussion, I appreciate very
 much all the work you've been putting into this.

Hey, thanks, I very much appreciate your support here!

 (1) is not hard to fix as soon as we set a policy, which the most
    pressing thing we need to do in my mind, whatever we manage to
    commit in 9.2.

 I agree that setting a policy is enormously important, not so sure
 about how easy it is to fix, but we shall see.

I think your proposal idea on the table is fixing it (the new event
timing specs), it's all about fine tuning it now.

 My primary and overriding goal here is to make sure that we don't make
 any design decisions, in the syntax or otherwise, that foreclose the
 ability to add policies that we've not yet dreamed of to future

Yeah we're not hard coding the list of possible event timings in the
syntax.

 In terms of policies, there are two that seems to me to be very clear:
 at the very beginning of ProcessUtility, and at the very end of it,
 *excluding* recursive calls to ProcessUtility that are intended to
 handle subcommands.  I think we could call the first start or
 command_start or ddl_command_start or post-parse or pre-locking or
 something along those lines, and the second end or command_end or
 ddl_command_end.  I think it might be worth including ddl in there
 because the scope of the current patch really is just DDL (as opposed
 to say DCL) and having a separate set of firing points for DCL does
 not seem dumb.  If we ever try to do anything with transaction control
 as you suggested upthread that's likely to also require special
 handling.  Calling it, specifically, ddl_command_start makes the scope
 very clear.

I'm not sure about that really. First, the only reason why DCL command
are not supported in the current patch is because roles are global
objects and we don't want role related behavior to depend on which
database you're currently connected to.

Then, I guess any utility command implementation will share the same
principle of having a series of step and allowing a user defined
function to get called in between some of them. So adding support for
commands that won't fit in the DDL timing specs we're trying to design
now amounts to adding timing specs for those commands.

I'm not sold that the timing specs should contain ddl or dcl or other
things, really, I find that to be ugly, but I won't fight over that.

 (2) can be addressed with a simple blacklist that would be set just
    before calling user defined code, and cleaned when done running it.
    When in place the blacklist lookup is easy to implement in a central
    place (utility.c or cmdtrigger.c) and ereport() when the current
    command is in the blacklist.

    e.g. alter table would blacklist alter table and drop table commands
    on the current object id

 Here we're talking about a firing point that we might call
 ddl_post_name_lookup or something along those lines.  I would prefer
 to handle this by making the following rules - here's I'm assuming
 that we're talking about the case where the object in question is a
 relation:

 1. The trigger fires immediately after RangeVarGetRelidExtended and
 before any other checks are performed, and especially before any
 CheckTableNotInUse(). This last is important, because what matters is
 whether the table's not in use AFTER all possible user-supplied code
 is executed.  If the trigger opens or closes cursors, for example,
 what matters is whether there are any cursors left open *after* the
 triggers complete, not whether there were any open on entering the
 trigger.  The same is true of any other integrity check: if there's a
 chance that the trigger could change something that affects whether
 the integrity constraint gets fired, then we'd better be darn sure to
 fire the trigger before we check the integrity constraint.

Ack.

 2. If we fire any triggers at this point, we CommandCounterIncrement()
 and then re-do RangeVarGetRelidExtended() with the same arguments we
 passed before.  If it doesn't return the same OID we got the first
 time, we abort with some kind of serialization error.  We need to be a
 little careful about the phrasing of the error message, because it's
 possible for this to change during command execution even without
 command triggers if somebody creates a table earlier in the search
 path with the same unqualified name that was passed to the command,
 but I think it's OK for the existence of a command-trigger to cause a
 spurious abort in that very narrow case, as long as the error message
 includes some appropriate weasel language.  Alternatively, we could
 try to avoid that corner case by rechecking only that a tuple with the
 right OID is still present and still has the correct relkind, but that
 seems like it might be a little less bullet-proof for no real
 functional gain.

I can buy having a

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp writes:
 I'm sorry to have coded a silly bug.
 The previous patch has a bug in realloc size calculation.
 And separation of the 'connname patch' was incomplete in regtest.
 It is fixed in this patch.

I've applied a modified form of the conname update patch.  It seemed to
me that the fault is really in the DBLINK_GET_CONN and
DBLINK_GET_NAMED_CONN macros, which ought to be responsible for setting
the surrounding function's conname variable along with conn, rconn, etc.
There was actually a second error of the same type visible in the dblink
regression test, which is also fixed by this more general method.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers patch v18

On Thu, Mar 29, 2012 at 4:21 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
Robert Haas robertmh...@gmail.com writes:
Well, preceding and before are synonyms, so I don't see any advantage
in that change. But I really did mean AT permissions_checking time,
not before or after it. That is, we'd have a hook where instead of
doing something like this:

aclresult = pg_class_aclcheck(reloid, GetUserId(), ACL_SELECT);

...we'd call a superuser-supplied trigger function and let it make the

Wow. I don't think I like that at all. It's indeed powerful, but how do
you go explaining and fixing unanticipated behavior with such things in
production? It looks too much like an invitation to break a very careful
design where each facility has to rove itself to get in.

I'm thinking of things like extension whitelisting. When some
unprivileged user says CREATE EXTENSION harmless, and harmless is
marked as superuser-only, we might like to have a hook that gets
called *at permissions-checking time* and gets to say, oh, well, that
extension is on the white-list, so we're going to allow it. I think
you can come up with similar cases for other commands, where in
general the operation is restricted to superusers or database owners
or table owners but in specific cases you want to allow others to do
it.

CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...])
EXECUTE PROCEDURE function_name(args);

I would amend that syntax to allow for a WHEN expr much like in the
DML trigger case, where the expression can play with the variables
exposed at the time of calling the trigger.

create event trigger prohibit_some_ddl
preceding timing spec
when tag in ('CREATE TABLE', 'ALTER TABLE')
execute procedure throw_an_error();

We could do it that way, but the tag in ('CREATE TABLE', 'ALTER
TABLE') syntax will be tricky to evaluate. I'd really prefer NOT to
need to start up and shut down the executor to figure out whether the
trigger has to fire. If we are going to do it that way then we need
to carefully define what variables are available and what values they
get. I think that most people's event-filtering needs will be simple
enough to be served by a more declarative syntax.

We could then maybe restrict that idea so that the syntax is more like

when trigger variable in (literal[, ...])

So that we just have to store the array of strings and support only one
operation there. We might want to go as far as special casing the object
id to store an oid vector there rather than a text array, but we could
also decide not to support per-oid command triggers in the first
release and remove that from the list of accepted trigger variables.

I guess that would make sense if you think there would ever be more
than one choice for trigger variable. I'm not immediately seeing a
use case for that, though - I was explicitly viewing the syntax foo
(bar, baz) to mean foo when
the-only-trigger-variable-that-makes-sense-given-that-we-are-talking-about-a-trigger-on-foo
in (bar, baz).

More generally, my thought on the structure of this is that you're
going to have certain toplevel events, many of which will happen at
only a single place in the code, like an object got dropped or a
DDL command started or a DDL command ended. So we give those
names, like sql_drop, ddl_command_start, and ddl_command_end. Inside
your trigger procedure, the set of magic variables that is available
will depend on which toplevel event you set the trigger on, but
hopefully all firings of that toplevel event can provide the same
magic variables. For example, at ddl_command_start time, you're just
gonna get the command tag, but at ddl_command_end time you will get
the command tag plus maybe some other stuff.

Now, we COULD stop there. I mean, we could document that you can
create a trigger on ddl_command_start and every DDL command will fire
that trigger, and if the trigger doesn't care about some DDL
operations, then it can look at the command tag and return without
doing anything for the operations it doesn't care about. The only
real disadvantage of doing it that way is speed, and maybe a bit of
code complexity within the trigger. So my further thought was that
we'd allow you to specify an optional filter list to restrict which
events would fire the trigger, and the exact meaning of that filter
list would vary depending on the toplevel event you've chosen - i.e.
for ddl_command_start, the filter would be a list of commands, but for
sql_drop it would be a list of object types. We could make that more
complicated if we think that an individual toplevel event will need
more than one kind of filtering. For example, if you wanted to filter
sql_drop events based on the object type AND/OR the schema name, then
the syntax I proposed would obviously not be adequate. I'm just not
convinced we need that, especially because you'd then need to set up a
dependency between the command trigger and the schema,

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

Marko Kreen mark...@gmail.com writes:
 My conclusion is that row-processor API is low-level expert API and
 quite easy to misuse.  It would be preferable to have something more
 robust as end-user API, the PQgetRow() is my suggestion for that.
 Thus I see 3 choices:

 1) Push row-processor as main API anyway and describe all dangerous
scenarios in documentation.
 2) Have both PQgetRow() and row-processor available in libpq-fe.h,
PQgetRow() as preferred API and row-processor for expert usage,
with proper documentation what works and what does not.
 3) Have PQgetRow() in libpq-fe.h, move row-processor to libpq-int.h.

I still am failing to see the use-case for PQgetRow.  ISTM the entire
point of a special row processor is to reduce the per-row processing
overhead, but PQgetRow greatly increases that overhead.  And it doesn't
reduce complexity much either IMO: you still have all the primary risk
factors arising from processing rows in advance of being sure that the
whole query completed successfully.  Plus it conflates no more data
with there was an error receiving the data or there was an error on
the server side.  PQrecvRow alleviates the per-row-overhead aspect of
that but doesn't really do a thing from the complexity standpoint;
it doesn't look to me to be noticeably easier to use than a row
processor callback.

I think PQgetRow and PQrecvRow just add more API calls without making
any fundamental improvements, and so we could do without them.  There's
more than one way to do it is not necessarily a virtue.

 Second conclusion is that current dblink row-processor usage is broken
 when user uses multiple SELECTs in SQL as dblink uses plain PQexec().

Yeah.  Perhaps we should tweak the row-processor callback API so that
it gets an explicit notification that this is a new resultset.
Duplicating PQexec's behavior would then involve having the dblink row
processor throw away any existing tuplestore and start over when it
gets such a call.

There's multiple ways to express that but the most convenient thing
from libpq's viewpoint, I think, is to have a callback that occurs
immediately after collecting a RowDescription message, before any
rows have arrived.  So maybe we could express that as a callback
with valid res but columns set to NULL?

A different approach would be to add a row counter to the arguments
provided to the row processor; then you'd know a new resultset had
started if you saw rowcounter == 0.  This might have another advantage
of not requiring the row processor to count the rows for itself, which
I think many row processors would otherwise have to do.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] query cache

On Thu, Mar 29, 2012 at 4:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It's also probably worth keeping in mind the next time we
 bump the protocol version: it would be nice to have a way of doing
 prepare-bind-execute in a single protocol message, which I believe to
 be not possible at present.

 Huh?  That's the standard way of doing it, actually.  You send
 Prepare/Bind/Execute/Sync in one packet, and wait for results.

That precludes some optimizations, like doing the whole thing under
the same snapshot, since you don't know for sure when the Execute will
be sent.  And as a practical matter, it's slower.  So there's some
room for improvement there, any way you slice it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)

On Thu, Mar 29, 2012 at 11:04 PM, Andrew Dunstan and...@dunslane.netwrote:


 On 03/29/2012 10:37 AM, Dobes Vandermeer wrote:

 Hi guys,

 Something from Josh's recent blog post about summer of code clicked with
 me - the HTTP / SQL concept.



  1. I've been in discussion with some people about adding simple JSON
 extract functions. We already have some (i.e. xpath()) for XML.

 2. You might find htsql http://htsql.org/ interesting.


As a reference, or should we just bundle / integrate that with PostgreSQL
somehow?

Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)

On Thu, Mar 29, 2012 at 6:25 PM, Dobes Vandermeer dob...@gmail.com wrote:
 2. You might find htsql http://htsql.org/ interesting.


 As a reference, or should we just bundle / integrate that with PostgreSQL
 somehow?

It's a totally different language layer without wide-spread popularity
and, as of last year, still figuring out how to model updates and
transactions, with a Python dependency.  It is good work, but to build
it in and to commit maintaining it seems very premature, to my eyes.
Also, it would tie someone -- possibly the htsql maintainers -- to a
upgrade and maintenance policy not unlike Postgres itself, and that is
a very expensive proposition while they are still figuring things out.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)

On Fri, Mar 30, 2012 at 3:59 AM, Daniel Farina dan...@heroku.com wrote:

 On Thu, Mar 29, 2012 at 12:57 PM, Daniel Farina dan...@heroku.com
 wrote:

More technical concerns:
  * Protocol compression -- but a bit of sand in the gears is *which*
  compression -- for database workloads, the performance of zlib can be
  a meaningful bottleneck.


I think if performance is the issue, people should use the native protocol.
 This HTTP thing should be more of a RAD / prototyping thing, I think.  So
people can be in their comfort zone when talking to the server.


 * Something similar to the HTTP Host header, so that one can route
  to databases without having to conflate database identity with the
  actual port being connected to.  Yes, theoretically it can be done
  with weird startup packet gyrations, but that is firmly in the weird
  category.


Isn't the URL good enough (/databases/dbname) or are you talking about
having some some of virtual host setup where you have multiple sets of
databases available on the same port?


Socialish (but no less important):

  * A standard frame extension format.  For example, last I checked
  Postgres-XC, it required snapshot information to be passed, and it'd
  be nice that instead of having to hack the protocol that they could
  attach an X-Snapshot-Info header, or whatever. This could also be a
  nice way to pass out-of-band hint information of all sorts.


I am sorry to admit I don't understand the terms frame extension format
or Postgres-XC in this paragraph ... help?


  * HTTP -- and *probably* its hypothetical progeny -- are more common
   than FEBE packets, and a lot of incidental complexity of writing
  routers is reduced by the commonality of routing HTTP traffic.


This is an interesting comment.  I'm not sure how to test whether an HTTP
based protocol will be better supported than a proprietary one, but I think
we have enough other reasons that we can move forward.  Well we have the
reason that there's some kind of love affair with HTTP based protocols
going on out there ... might as well ride the wave while it's still rising
(I hope).

As for SPDY I can see how it may be helpful but as it is basically a
different way to send HTTP requests (from what I understand) the migration
to SPDY is mainly a matter of adding support for it to whatever HTTP
library is used.

Anyone have a thought on whether, for the HTTP server itself, it should be
integrated right into the PostgreSQL server itself?  Or would it be better
to have a separate process that proxies requests to PostgreSQL using the
existing protocol?  Is there an API that can be used in both cases
semi-transparently (i.e. the functions have the same name when linked right
in, or when calling via a socket)?

Cheers,

Dobes

Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)

On Fri, Mar 30, 2012 at 3:57 AM, Daniel Farina dan...@heroku.com wrote:

 On Thu, Mar 29, 2012 at 8:04 AM, Andrew Dunstan and...@dunslane.net
 wrote:



 Lastly, a case that can not as easily be fixed without some more
 thinking is leveraging caching semantics of HTTP.   think people would
 really, really like that, if they could get away from having to
 hand-roll their own cache regeneration in common cases.


I think this could be an interesting possibility.  I wonder if the cost of
a round-trip makes the cost of sending the actual data (vs a 304 response)
irrelevant - as long as PostgreSQL is caching effectively internally it's
possible it can send back the actual content as fast as it can calculate
the ETag for it, so doing an extra query to check for changes could
possibly slow things down, or at least eliminate the benefit.  Probably
worth trying, though.

Supporting this automagically would require some kind of generic
algorithm for calculating the Last-Modifed time or ETag for a given query.
 As a default we may be able to just fall back on some internal global
value that is guaranteed to change if the database has changed (I think the
WAL files have some kind of serial number system we might use) so at the
very least you could send back a 304 Not Modified if literally nothing in
the database has changed.  Narrowing that down to specific table timestamps
might be possible, too, for simple queries.  It depends what data is
already available, I wouldn't want to add any extra book keeping for it.

A more pragmatic may be to have the HTTP request include SQL code to
generate an ETag or Last-Modified value to test with; the app could run
that first and it would be used for caching.  Something like
calcLastModified=max(modified_date) on the query string.

Re: [HACKERS] Odd out of memory problem.

On tis, 2012-03-27 at 00:53 +0100, Greg Stark wrote:
 Hm. So my original plan was dependent on adding the state-merge
 function we've talked about in the past. Not all aggregate functions
 necessarily can support such a function but I think all or nearly all
 the builtin aggregates can. Certainly min,max, count, sum, avg,
 stddev, array_agg can which are most of what people do. That would be
 a function which can take two state variables and produce a new state
 variable.

This information could also be useful to have in PL/Proxy (or similar
FDWs) to be able to integrate aggregate computation into the language.
Currently, you always have to do the state merging yourself.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)

On Thu, Mar 29, 2012 at 6:37 PM, Dobes Vandermeer dob...@gmail.com wrote:
 On Fri, Mar 30, 2012 at 3:59 AM, Daniel Farina dan...@heroku.com wrote:

 On Thu, Mar 29, 2012 at 12:57 PM, Daniel Farina dan...@heroku.com
 wrote:

 More technical concerns:
  * Protocol compression -- but a bit of sand in the gears is *which*
  compression -- for database workloads, the performance of zlib can be
  a meaningful bottleneck.


 I think if performance is the issue, people should use the native protocol.
  This HTTP thing should be more of a RAD / prototyping thing, I think.  So
 people can be in their comfort zone when talking to the server.

No. I do not think so. I think a reasonable solution (part of what MS
is actually proposing to the IETF) is to make compression optional, or
have clients support an LZ77 family format like Snappy.  I get the
impression that SPDY is waffling a little on this fact, it mandates
compression, and definitely zlib, but is less heavy handed about
pushing, say Snappy.  Although I can understand why a
Google-originated technology probably doesn't want to push another
Google-originated implementation so hard, it would have been
convenient for me for Snappy to have become a more common format.

 Isn't the URL good enough (/databases/dbname) or are you talking about
 having some some of virtual host setup where you have multiple sets of
 databases available on the same port?

Virtual hosts. Same port.

  * A standard frame extension format.  For example, last I checked
  Postgres-XC, it required snapshot information to be passed, and it'd
  be nice that instead of having to hack the protocol that they could
  attach an X-Snapshot-Info header, or whatever. This could also be a
  nice way to pass out-of-band hint information of all sorts.


 I am sorry to admit I don't understand the terms frame extension format or
 Postgres-XC in this paragraph ... help?

I'm being vague.  Postgres-XC is a project to provide a shared-nothing
sync-rep cluster for Postgres.  My last understanding of it is that it
needed to pass snapshot information between nodes, and FEBE was
expanded to make room for this, breaking compatibility, as well as
probably being at least a small chore.  It'd be nice if it wasn't
necessary to do that and they had a much easier path to multiplex
additional information into the connection.

For my own purposes, I'm intensely interest in lacing the connection with:

* EXPLAIN ANALYZE returns when the query has already run, getting both
the actual timings *and* the results to the client.

* Partition IDs, whereby you can find the right database and
(potentially!) even influence how the queries are scoped to a tenant

* Read-only vs. Write workload: As is well established, it's hard to
know a-priori if a query is going to do a write.  Fine. Let the client
tag it, signal an error if something is wrong.

Yes, theoretically all these features -- or just a general
multiplexing scheme -- can be added to FEBE, but if we're even going
to consider such an upheaval, maybe we can get *lot* more bang for our
buck by trying to avoid being unnecessarily different from the most
common application-level protocol in existence, causing extraneous
work for router and proxy authors.  Notably, a vanilla Postgres
database knows nothing about these extension headers.

  * HTTP -- and *probably* its hypothetical progeny -- are more common
  than FEBE packets, and a lot of incidental complexity of writing
  routers is reduced by the commonality of routing HTTP traffic.

 This is an interesting comment.  I'm not sure how to test whether an HTTP
 based protocol will be better supported than a proprietary one, but I think
 we have enough other reasons that we can move forward.  Well we have the
 reason that there's some kind of love affair with HTTP based protocols going
 on out there ... might as well ride the wave while it's still rising (I
 hope).

At its core, what may be growing unnecessary is FEBE's own mechanism
for delimiting messages. All the other protocol actions -- such as
shipping Binds, Executes, Describes, et al, are not going to be
obsoleted or even changed by laying web-originated technologies under
FEBE.

Consider the wealth of projects, products, and services that filter
HTTP vs FEBE, many quite old, now.  In my mind, wave might be better
rendered tsunami.  The very real problem, as I see it, is that
classic, stateless HTTP would be just too slow to be practical.

 As for SPDY I can see how it may be helpful but as it is basically a
 different way to send HTTP requests (from what I understand) the migration
 to SPDY is mainly a matter of adding support for it to whatever HTTP library
 is used.

I think SPDY or like-protocols (there's only one other I can think of,
the very recently introduced and hilariously branded SM from
Microsoft.  It's memorable, at least) are the only things that appear
to give a crisp treatment to interactive, stateful workloads involving
back-and-forth between client and

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-03-29 Thread Kyotaro HORIGUCHI

 I've applied a modified form of the conname update patch.  It seemed to
 me that the fault is really in the DBLINK_GET_CONN and
 DBLINK_GET_NAMED_CONN macros, which ought to be responsible for setting
 the surrounding function's conname variable along with conn, rconn, etc.
 There was actually a second error of the same type visible in the dblink
 regression test, which is also fixed by this more general method.

Come to think of it, the patch is mere a verifying touch for the
bug mingled with the other part of the dblink patch to all
appearances. I totally agree with you. It should be dropped for
this time and done another time.

I'am sorry for bothering you by such a damn thing.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)