Re: [HACKERS] Finer Extension dependencies
On Wed, Mar 28, 2012 at 8:52 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Mar 28, 2012 at 11:28 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: In practice, however, that sounds like a real pain in the neck. I would expect most people who were packaging extensions to handle a situation like this by forcing the user to provide the name of the function to be called, either via a control table or via a GUC. And once you've done that, it makes no sense to shove a feature dependency into the extension, because the user might very well just write an appropriate function themselves and tell the extension to call that. I don't know what you're talking about here, all I can say is that is has nothing to do with what the patch is implementing. What's in the patch is a way to depend on known versions of an extension rather than the extension wholesale, whatever the version. Using feature dependency allow to avoid 2 particularly messy things: - imposing a version numbering scheme with a comparator - maintaining a separate feature matrix So instead of having to say foo version 1.2 is now doing buzz and having an extension depend on foo = 1.2, you can say that your extension depends on the buzz feature. That's about it. Based on this information, it seems that I've misinterpreted the purpose of the patch. Since extension features seem to live in a global namespace, I assumed that the purpose of the patch was to allow both extension A and extension B to provide feature F, and extension C to depend on F rather than A or B specifically. What I understand you to be saying here is that's not really what you're trying to accomplish. Instead, you're just concerned about allowing some but not all versions of package A to provide feature F, so that other extensions can depend on F to get the specific version of A that they need (and not, as I had assumed, so that they can get either A or B). Let me think more about that. Maybe I'm just easily confused here, or maybe there is something that should be changed in the code or docs; I'm not sure yet. On a more prosaic note, you seem to have made a mistake when generating the v5 diff. It includes reverts of a couple of unrelated, recent patches. WTF? WTF? On a further note, I have spent a heck of a lot more time reviewing other people's patches this CommitFest than you have, and I don't appreciate this. If you'd rather that I didn't spend time on this patch, I have plenty of other things to do with my time. Frankly I'm still against this patch. Since I started to review it I've never been convinced with the use case. Yeah, someone said it'd be useful to him, but as a developer of some of PGXN modules I don't see it. I totally agree with Robert's point that one feature is not standardized and nobody can tell how you can depend on the feature in the end. Mind you, I've never heard about building dependency by its name as a string in other packaging system. If you want to introduce the concept of version dependency not feature name dependency, do *it*; I don't think feature dependency solves it. I know you Dimitri is working so hard for this and other patches, but it seems to me that the quality of both of the design and patch code are not adequate at this point of time. I think I agree we are not 100% happy with the current dependency system of extensions, but we need more time to think and make it mature idea rather than rushing and pushing and dropping something premature. The cost we would pay if we rushed this to this release will be higher than what we'd get from it, I think. Thanks, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Mon, Mar 26, 2012 at 12:14 AM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2012-03-20 at 01:38 -0700, Daniel Farina wrote: On Thu, Mar 15, 2012 at 9:39 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Mar 16, 2012 at 8:14 AM, Daniel Farina dan...@heroku.com wrote: Parallel to pg_cancel_backend, it'd be nice to allow the user to just outright kill a backend that they own (politely, with a SIGTERM), aborting any transactions in progress, including the idle transaction, and closing the socket. +1 Here's a patch implementing the simple version, with no more guards against signal racing than have been seen previously. The more elaborate variants to close those races is being discussed in a parallel thread, but I thought I'd get this simple version out there. Review: After reading through the threads, it looks like there was no real objection to this approach -- pid recycling is not something we're concerned about. I think you're missing a doc update though, in func.sgml: For the less restrictive functionpg_cancel_backend/, the role of an active backend can be found from the structfieldusename/structfield column of the structnamepg_stat_activity/structname view. Also, high-availability.sgml makes reference to pg_cancel_backend(), and it might be worthwhile to add an ...and pg_terminate_backend() there. Other than that, it looks good to me. Good comments. Patch attached to address the doc issues. The only iffy thing is that the paragraph For the less restrictive... I have opted to remove in its entirely. I think the documents are already pretty clear about the same-user rule, and I'm not sure if this is the right place for a crash-course on attributes in pg_stat_activity (but maybe it is). ...and pg_terminate_backend seems exactly right. And I think now that the system post-patch doesn't have such a strange contrast between the ability to send SIGINT vs. SIGTERM, such a contrast may not be necessary. I'm also not sure what the policy is about filling paragraphs in the manual. I filled one, which increases the sgml churn a bit. git (show|diff) --word-diff helps clean it up. -- fdr Extend-same-role-backend-management-to-pg_terminate_backend-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Hitoshi Harada umi.tan...@gmail.com writes: Frankly I'm still against this patch. Since I started to review it I've never been convinced with the use case. Yeah, someone said it'd be useful to him, but as a developer of some of PGXN modules I don't see it. I totally agree with Robert's point that one feature is not standardized and nobody can tell how you can depend on the feature in the end. Mind you, I've never heard about building dependency by its Ok, we might need to find another word for the concept here. Will think, would appreciate native speakers' thought. name as a string in other packaging system. If you want to introduce the concept of version dependency not feature name dependency, do *it*; I don't think feature dependency solves it. I don't want to introduce version dependency, because I don't think we need it. If you want to compare what we're doing here with say debian packaging, then look at how they package libraries. The major version number is now part of the package name and you depend on that directly. So let's take the shortcut to directly depend on the “feature” name. For a PostgreSQL extension example, we could pick ip4r. That will soon include support for ipv6 (it's already done code wise, missing docs update). If you want to use ip4r for storing ipv6, you will simply require “ip6r” or whatever feature name is provided by the extension including it. If you really think this can not be made to work in your use cases, please provide us with an example where it fails. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
On Thu, Mar 29, 2012 at 12:51 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Hitoshi Harada umi.tan...@gmail.com writes: Frankly I'm still against this patch. Since I started to review it I've never been convinced with the use case. Yeah, someone said it'd be useful to him, but as a developer of some of PGXN modules I don't see it. I totally agree with Robert's point that one feature is not standardized and nobody can tell how you can depend on the feature in the end. Mind you, I've never heard about building dependency by its Ok, we might need to find another word for the concept here. Will think, would appreciate native speakers' thought. name as a string in other packaging system. If you want to introduce the concept of version dependency not feature name dependency, do *it*; I don't think feature dependency solves it. I don't want to introduce version dependency, because I don't think we need it. If you want to compare what we're doing here with say debian packaging, then look at how they package libraries. The major version number is now part of the package name and you depend on that directly. So let's take the shortcut to directly depend on the “feature” name. For a PostgreSQL extension example, we could pick ip4r. That will soon include support for ipv6 (it's already done code wise, missing docs update). If you want to use ip4r for storing ipv6, you will simply require “ip6r” or whatever feature name is provided by the extension including it. So my question is why you cannot depend on ip4r in that case. If some version of the module introduces ipv6, then let's depend on that version. It doesn't explain why a string feature name is needed. Thanks, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standbys, txid_current_snapshot, wraparound
On Wed, Mar 28, 2012 at 10:54:58PM +0100, Simon Riggs wrote: On Wed, Mar 28, 2012 at 10:24 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Mar 28, 2012 at 9:48 PM, Marko Kreen mark...@gmail.com wrote: On Fri, Mar 23, 2012 at 08:52:40AM +, Simon Riggs wrote: Master pg_controldata - OK txid_current_snapshot() - OK Standby pg_controldata - OK txid_current_snapshot() - lower value On Skytools list is report about master with slaves, but the lower value appears on master too: http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html Cc'd original reporter too. Thanks. Am looking. I can't see how this could happen on the master at all. On the standby, it can happen if we skip restartpoints for about a couple of billion xids. Which would be a problem. More on this tomorrow. I can't find a place where WAL replay updates values under XLogCtl. If that really does not happen, that would explain why standbys can see wrong epoch. No clue yet how master can get broken. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Hitoshi Harada umi.tan...@gmail.com writes: So my question is why you cannot depend on ip4r in that case. If some version of the module introduces ipv6, then let's depend on that version. It doesn't explain why a string feature name is needed. The only operator we have to compare version strings in PostgreSQL extensions is string equality. That's because we don't need more, and designing a version scheme policy would be far more work that any benefit I could imagine. And as we didn't enforce version naming policy in 9.1, it could be argued that it's too late, too. When you say 'require = ip6r' you are depending on *any* version of the extension that is providing it, whatever its version string. You don't have to know that '1.05' '1.06' '1.1' and you don't have to know that the first version with ipv6 support was called '1.06'. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Hi, Thanks for your review! Robert Haas robertmh...@gmail.com writes: I think the lack of pg_upgrade support is a must-fix before commit. I though that would only be a TODO for 9.2 to 9.3 upgrades. When upgrading from 9.1 to 9.2, pg_upgrade will directly stuff extensions using InsertExtensionTuple() with an empty features list. That will call insert_extension_features() which ensures that the extension name is registered as a feature even when not explicitly listed (backward compatibility with control files). So I think we're covered now, but still need to revisit the issue later. I edited the comment this way: List *features = NIL; /* 9.3 should get features from catalogs */ Don't we need some psql support for listing the features provided by an installed extension? And an available extension? It's already in the pg_available_extension_versions system's view, which is not covered by psql (yet). Do we want a new \dx derived command? get_extension_feature_oids seems to be misnamed, since it returns only one OID, not more than one, and it also returns the feature OID. At a minimum, I think you need to delete the s from the function name; possibly it should be renamed to lookup_extension_feature or somesuch. Done. List *requires; /* names of prerequisite extensions */ + List *provides; /* names of provided features */ Comment in requires line of above hunk should be updated to say prerequisite features. Done. - Useless hunk. That must be my editor adding a final line when I visit files… and I can't seem to be able to prevent this hunk from happening? +errmsg(parameter \%s\ must be a list of extension names, This error, which relates to the parsing of the provides list, say extension features, and the existing code for requires needs to be updated to say that as well. Done. +errmsg(extension feature \%s\ already exists [%u], + feature, featoid))); That [%u] at the end there does not conform to our message style guidelines, and I think the rest of the message could be improved a bit as well. I think maybe it'd be appropriate to work the name of the other extension into the message, maybe something like: extension %s provides feature %s, but existing extension %s already provides this feature Done. +extern char * get_extension_feature_name(Oid featoid); Extra space. Fixed. + if (strcmp(curreq,pcontrol-name) != 0) Missing space (after the comma). Fixed. OCLASS_EXTENSION_FEATURE should probable be added to the penultimate switch case in ATExecAlterColumnType. Right, done now. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standbys, txid_current_snapshot, wraparound
On Wed, Mar 28, 2012 at 10:54 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Mar 28, 2012 at 10:24 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Mar 28, 2012 at 9:48 PM, Marko Kreen mark...@gmail.com wrote: On Fri, Mar 23, 2012 at 08:52:40AM +, Simon Riggs wrote: Master pg_controldata - OK txid_current_snapshot() - OK Standby pg_controldata - OK txid_current_snapshot() - lower value On Skytools list is report about master with slaves, but the lower value appears on master too: http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html Cc'd original reporter too. Thanks. Am looking. I can't see how this could happen on the master at all. On the standby, it can happen if we skip restartpoints for about a couple of billion xids. Which would be a problem. More on this tomorrow. I've not been able to recreate the problem up till now. But knowing there is a bug helps develop a theory based upon the code. CreateCheckpoint() increments the epoch on the master at the next checkpoint after wraparound. (I'd be happier if there was an explicit link between those two points; there's not but I can't yet see a problem). When the standby receives the checkpoint record, it stores the information in 2 places: i) directly into ControlFile-checkPointCopy ii) and then into XLogCtl when a safe restartpoint occurs It's possible that a safe restartpoint could be delayed. When that happens, the XLogCtl copy grows stale. If the delay is long enough, then the NextXid counter will increase and will eventually be higher than the last XLogCtl, causing the epoch returned by GetNextXidAndEpoch() to go backwards by 1. If it is delayed even further it would wrap again and increase again by one, then decrease, then increase. Given enough time and/or a very busy server using lots of xids. At the same time, when we do UpdateControlFile() the other copy of the epoch is written to the controlfile, so the controlfile shows the accurate value for the epoch. So I can explain how we get two different answers from the standby, and I can explain how the error is very hard to reproduce and apparently transient. I can't explain anything involving the master. The key is what delays the restartpoint?. And the answer there is probably bugs in index code which purport to see invalid pages when they probably shouldn't be there. So a REINDEX would likely help. I'll look at a patch to improve this. Definite bug and will be backpatched. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standbys, txid_current_snapshot, wraparound
On Thu, Mar 29, 2012 at 10:37:54AM +0100, Simon Riggs wrote: When the standby receives the checkpoint record, it stores the information in 2 places: i) directly into ControlFile-checkPointCopy ii) and then into XLogCtl when a safe restartpoint occurs In RecoveryRestartPoint() I see: - memcpy(XLogCtl-lastCheckPoint, checkPoint, sizeof(CheckPoint)); but I still don't see how are the ckptXid* values updated? What am I missing? -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for parallel pg_dump
On Wed, Mar 28, 2012 at 9:54 PM, Joachim Wieland j...@mcknight.de wrote: On Wed, Mar 28, 2012 at 1:46 PM, Robert Haas robertmh...@gmail.com wrote: I'm wondering if we really need this much complexity around shutting down workers. I'm not sure I understand why we need both a hard and a soft method of shutting them down. At least on non-Windows systems, it seems like it would be entirely sufficient to just send a SIGTERM when you want them to die. They don't even need to catch it; they can just die. At least on my Linux test system, even if all pg_dump processes are gone, the server happily continues sending data. When I strace an individual backend process, I see a lot of Broken pipe writes, but that doesn't stop it from just writing out the whole table to a closed file descriptor. This is a 9.0-latest server. Wow, yuck. At least now I understand why you're doing it like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Wed, Mar 28, 2012 at 10:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: The SELECT INTO tests all fail, but we know the reason why (the testbed isn't expecting them to result in creating separate entries for the utility statement and the underlying plannable SELECT). This might be a dumb idea, but for a quick hack, could we just rig SELECT INTO, CREATE TABLE AS, and EXPLAIN not to create entries for themselves at all, without suppressing creation of an entry for the underlying query? The output might be slightly misleading but it wouldn't be broken, and I'm disinclined to want to spend a lot of time doing fine-tuning right now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
On Wed, Mar 28, 2012 at 3:41 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: What about : create command trigger foo before prepare alter table … create command trigger foo before start of alter table … create command trigger foo before execute alter table … create command trigger foo before end of alter table … create command trigger foo when prepare alter table … create command trigger foo when start alter table … create command trigger foo when execute of alter table … create command trigger foo when end of alter table … create command trigger foo preceding alter table … create command trigger foo preceding alter table … deferred create command trigger foo preceding alter table … immediate create command trigger foo following parser of alter table … create command trigger foo preceding execute of alter table … create command trigger foo following alter table … create command trigger foo before alter table nowait … I'm not sure how many hooks we can really export here, but I guess we have enough existing keywords to describe when they get to run (parser, mapping, lock, check, begin, analyze, access, disable, not exists, do, end, escape, extract, fetch, following, search…) I am sure that we could find a way to beat this with a stick until it behaves, but I don't really like that idea. It seems to me to be a case of adding useless parser bloat. Prior to 9.0, when we introduced the new EXPLAIN syntax, new EXPLAIN options were repeatedly proposed and partly rejected on the grounds that they weren't important enough to justify adding new keywords. We've now got EXPLAIN (COSTS, BUFFERS, TIMING, FORMAT) and there will probably be more in the future, and the parser overhead of adding a new one is zero. I think we should learn from that lesson: when you may want to have a bunch of different options and it's not too clear what they're all going to be named, designing things in a way that avoids a dependency on the main parser having the right keywords leads to less patch rejection and more getting stuff done. I've also had another general thought about safety: we need to make sure that we're only firing triggers from places where it's safe for user code to perform arbitrary actions. In particular, we have to assume that any triggers we invoke will AcceptInvalidationMessages() and CommandCounterIncrement(); and we probably can't do it at all (or at least not without some additional safeguard) in places where the code does CheckTableNotInUse() up through the point where it's once again safe to access the table, since the trigger might try. We also I've been trying to do that already. need to think about re-entrancy: that is, in theory, the trigger might execute any other DDL command - e.g. it might drop the table that we've been asked to rename. I suspect that some of the current BEFORE That's why I implemented ALTER COMMAND TRIGGER ... SET DISABLE in the first place, so that you could run the same command from the trigger itself without infinite recursion. trigger stuff might fall down on that last one, since the existing code not-unreasonably expects that once it's locked the table, the catalog entries can't go away. Maybe it all happens to work out OK, but I don't think we can count on that. I didn't think of DROP TABLE in a command table running BEFORE ALTER TABLE, say. That looks evil. It could be documented as yet another way to shoot yourself in the foot though? Well, it depends somewhat on how it fails. If it fails by crashing the server, for example, I don't think that's going to fly. My suspicion is that it won't do that, but what it might do is fail in some pretty odd and unpredictable ways, possibly leading to catalog corruption, which I don't feel too good about either. Think about not just ALTER vs. DROP but also ALTER vs. ALTER. It's probably easier to add guards against this kind of thing than it is to prove that it's not going to do anything too wacky; the obvious idea that comes to mind is to bump the command counter after returning from the last trigger (if that doesn't happen already) and then verify that the tuple is still present and has whatever other properties we've already checked and are counting on, and if not chuck an error. I think that for a first version of this feature it probably would be smart to trim this back to something that doesn't involve surgery on the guts of every command; then, we can extend incrementally. Nothing you've proposed seems impossible to me, but most of the really interesting things are hard, and it would be much easier to handle patches intended to cater to more complex use cases if the basic infrastructure were already committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] Standbys, txid_current_snapshot, wraparound
On Thu, Mar 29, 2012 at 11:12 AM, Marko Kreen mark...@gmail.com wrote: On Thu, Mar 29, 2012 at 10:37:54AM +0100, Simon Riggs wrote: When the standby receives the checkpoint record, it stores the information in 2 places: i) directly into ControlFile-checkPointCopy ii) and then into XLogCtl when a safe restartpoint occurs In RecoveryRestartPoint() I see: - memcpy(XLogCtl-lastCheckPoint, checkPoint, sizeof(CheckPoint)); but I still don't see how are the ckptXid* values updated? What am I missing? It's updated on the master and then copied into place. Patch coming in a few hours. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 29 March 2012 02:09, Tom Lane t...@sss.pgh.pa.us wrote: Thanks. I've committed the patch along with the docs, after rather heavy editorialization. Thank you. 1. What to do with EXPLAIN, SELECT INTO, etc. We had talked about tweaking the behavior of statement nesting and some other possibilities. I think clearly this could use improvement but I'm not sure just how. (Note: I left out the part of your docs patch that attempted to explain the current behavior, since I think we should fix it not document it.) Yeah, this is kind of unsatisfactory. Nobody would expect the module to behave this way. On the other hand, users probably aren't hugely interested in this information. I'm still kind of attached to the idea of exposing the hash value in the view. It could be handy in replication situations, to be able to aggregate statistics across the cluster (assuming you're using streaming replication and not a trigger based system). You need a relatively stable identifier to be able to do that. You've already sort-of promised to not break the format in point releases, because it is serialised to disk, and may have to persist for months or years. Also, it will drive home the reality of what's going on in situations like this (from the docs): In some cases, queries with visibly different texts might get merged into a single pg_stat_statements entry. Normally this will happen only for semantically equivalent queries, but there is a small chance of hash collisions causing unrelated queries to be merged into one entry. (This cannot happen for queries belonging to different users or databases, however.) Since the hash value is computed on the post-parse-analysis representation of the queries, the opposite is also possible: queries with identical texts might appear as separate entries, if they have different meanings as a result of factors such as different search_path settings. 2. Whether and how to adjust the aging-out of sticky entries. This seems like a research project, but the code impact should be quite localized. As I said, I'll try and give it some thought, and do some experiments. BTW, I eventually concluded that the parameterization testing we were worried about before was a red herring. As committed, the patch tries to store a normalized string if it found any deletable constants, full stop. This seems to me to be correct behavior because the presence of constants is exactly what makes the string normalizable, and such constants *will* be ignored in the hash calculation no matter whether there are other parameters or not. Yeah, that aspect of not canonicalising parametrised entries had bothered me. I guess we're better off gracefully handling the problem across the board, rather than attempting to band-aid the problem up by just not having speculative hashtable entries in cases where they arguably are not so useful. Avoiding canonicalising those constants was somewhat misleading. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
On Thu, Mar 29, 2012 at 4:37 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Hitoshi Harada umi.tan...@gmail.com writes: So my question is why you cannot depend on ip4r in that case. If some version of the module introduces ipv6, then let's depend on that version. It doesn't explain why a string feature name is needed. The only operator we have to compare version strings in PostgreSQL extensions is string equality. That's because we don't need more, and designing a version scheme policy would be far more work that any benefit I could imagine. And as we didn't enforce version naming policy in 9.1, it could be argued that it's too late, too. When you say 'require = ip6r' you are depending on *any* version of the extension that is providing it, whatever its version string. You don't have to know that '1.05' '1.06' '1.1' and you don't have to know that the first version with ipv6 support was called '1.06'. If I recall previous discussion correctly, there's fairly broad consensus among people Tom talks to that dependencies on specific versions are inferior to dependencies on features provided by those versions. That having been said, it might be hard for package authors to know what features other packages want to depend on; and perhaps any version of a package might add a new feature that someone might want. So it might be that, after we add this feature, nearly all packagers will do one of two things: 1. Not add a provides line at all, thus making it impossible for anyone else to depend on specific versions; or 2. Add a new feature to the provides line with every release that does anything other than fix bugs, leading to: provides = foobar-1.1, foobar-1.2, foobar-1.3, foobar-1.4, foobar-1.5, foobar-1.6, foobar-2.0, foobar-2.1, foobar-2.2, foobar-2.3, foobar-3.0, foobar-3.1 If that turns out to be the case, we may regret designing the feature this way. Again, the fact that packaging is part of shipping a PostgreSQL extension rather than something that gets tacked on after the fact is not an irrelevant detail. Red Hat can rearrange all the tags they ship at will every time they put out a new version of their distribution, they can enforce uniform standards for how those tags are named, and, maybe most important of all, they don't need to add a tag for every version of the feature that someone *might* want to depend on - only the things that someone *does* want to depend on. Now in spite of all that I'm not sure this is a bad way to go: maybe trying to solve the problem and ending up with something imperfect is better than throwing our hands up in the air and doing nothing. But on the other hand I don't think Harada-san's comments are entirely ill-founded either: how sure are we that this is the right way to go, and what feedback do we have from people who are using this feature that leads us to think this is adequate? Does anyone else have an opinion on this? I think that technically this patch can be polished well enough to commit in the time we have available, but the question of whether it's the right design is harder, and I don't want that to be my call alone. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Robert Haas wrote: I think that technically this patch can be polished well enough to commit in the time we have available, but the question of whether it's the right design is harder, and I don't want that to be my call alone. I gather from previous posts that the intent isn't to allow different packages from different authors to provide a common and compatible feature; but what happens in the current design if someone accidentally or maliciously produces an extension which provides the same feature name as another extension? Would we need some registry? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ECPG FETCH readahead
2012-03-29 12:59 keltezéssel, Boszormenyi Zoltan írta: 2012-03-29 02:43 keltezéssel, Noah Misch írta: On Sat, Mar 24, 2012 at 10:49:07AM +0100, Boszormenyi Zoltan wrote: +the window size may be modified by setting theliteralECPGFETCHSZ/literal +environment variable to a different value. I had in mind that DECLARE statements adorned with READAHEAD syntax would always behave precisely as written, independent of ECPGFETCHSZ or ecpg -R. Unadorned DECLARE statements would use ECPGFETCHSZ if set, then the value passed to ecpg -R if provided, and finally a value of 1 (no readahead) in the absence of both ECPGFETCHSZ and ecpg -R. Did you do it differently for a particular reason? I don't particularly object to what you've implemented, but I'd be curious to hear your thinking. What I had in mind was: NO READAHEAD == READAHEAD 0. This will translate into 1 tuple sized readahead window that cannot be modified by ECPGFETCHSZ. After the core patch, this is not totally true yet. The NO READAHEAD cursors don't go through the new cursor functions at all. But cursor.c is prepared for the second patch that makes all cursors use the caching code. Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
Robert Haas robertmh...@gmail.com writes: I am sure that we could find a way to beat this with a stick until it behaves, but I don't really like that idea. It seems to me to be a [...] we should learn from that lesson: when you may want to have a bunch of I first wanted to ensure that reusing existing parser keyword wouldn't be well enough for our case as I find that solution more elegant. If we can't think of future places where we would want to add support for calling command triggers, then yes, you're right, something more flexible is needed. I'll go make that happen, and still need input here. We first want to have command triggers on specific commands or ANY command, and we want to implement 3 places from where to fire them. Here's a new syntax proposal to cope with that: create command trigger before COMMAND_STEP of alter table execute procedure snitch(); - before the process utility switch, with only command tag and parse tree create command trigger foo before start of alter table execute procedure snitch(); - before running the command, after having done basic error checks, security checks, name lookups and locking, with all information create command trigger before execute of alter table execute procedure snitch(); - after having run the command create command trigger foo before end of alter table execute procedure snitch(); Some other places make sense to add support for, but working within the delays we have and with the current patch, those 3 points are what's easy enough to implement now. I didn't think of DROP TABLE in a command table running BEFORE ALTER TABLE, say. That looks evil. It could be documented as yet another way to shoot yourself in the foot though? I think that for a first version of this feature it probably would be smart to trim this back to something that doesn't involve surgery on the guts of every command; then, we can extend incrementally. A simple enough solution here would be to register a list disabled commands per objectid before running the user defined function, and check against that list before allowing it to run. As we have command trigger support for all the commands we want to be able to block, the lookup and ereport() call can be implemented in InitCommandContext() where we already apply some limitations (like no command triggers on command trigger commands). Once that is in place it's possible to refine the pre-condition checks on a per command basis and stop adding the command from the black list. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
On Thu, Mar 29, 2012 at 1:01 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: I gather from previous posts that the intent isn't to allow different packages from different authors to provide a common and compatible feature; but what happens in the current design if someone accidentally or maliciously produces an extension which provides the same feature name as another extension? Would we need some registry? A good (documented) convention should make that unnecessary such as: packagename.feature for example provides hstore.populate_record Or something along those lines. Or maybe even prefix it with java like inverse url of author of package. Cheers, Bene -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Kevin Grittner kevin.gritt...@wicourts.gov writes: I gather from previous posts that the intent isn't to allow different packages from different authors to provide a common and compatible feature; but what happens in the current design if someone accidentally or maliciously produces an extension which provides the same feature name as another extension? It's not about that, it's more like the features/require/provide concepts in Lisp, except that we're not using them to load files. The goal really is to avoid a feature matrix and a version policy with comparators, yet be able to depend on features that got implemented after the first release of an extension. Would we need some registry? That being said, we still have a single namespace for extensions and their features, so a registry would help, yes. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
On Thu, Mar 29, 2012 at 8:01 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas wrote: I think that technically this patch can be polished well enough to commit in the time we have available, but the question of whether it's the right design is harder, and I don't want that to be my call alone. I gather from previous posts that the intent isn't to allow different packages from different authors to provide a common and compatible feature; but what happens in the current design if someone accidentally or maliciously produces an extension which provides the same feature name as another extension? Would we need some registry? One thing I was thinking about was whether we should restrict feature names to be of some specific form, like extension_name:feature_name. That would address this issue, and would also keep people from thinking of this as an alternatives mechanism, as I did. Of course, that doesn't prevent someone from publishing an ip4r module that erases your hard disk, but there's nothing much we can do about that problem from within core PostgreSQL. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
On 29 March 2012 13:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: I'll go make that happen, and still need input here. We first want to have command triggers on specific commands or ANY command, and we want to implement 3 places from where to fire them. Here's a new syntax proposal to cope with that: create command trigger before COMMAND_STEP of alter table execute procedure snitch(); - before the process utility switch, with only command tag and parse tree create command trigger foo before start of alter table execute procedure snitch(); - before running the command, after having done basic error checks, security checks, name lookups and locking, with all information create command trigger before execute of alter table execute procedure snitch(); - after having run the command create command trigger foo before end of alter table execute procedure snitch(); Is it necessary to add this complexity in this version? Can't we keep it simple but in a way that allows the addition of this later? The testing of all these new combinations sounds like a lot of work. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standbys, txid_current_snapshot, wraparound
On Thu, Mar 29, 2012 at 12:06 PM, Simon Riggs si...@2ndquadrant.com wrote: Patch coming in a few hours. This is more straightforward than I was thinking. We just need to initialise XLogCtl at the right place. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ff7f521..d268014 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6320,6 +6320,10 @@ StartupXLOG(void) /* No need to hold ControlFileLock yet, we aren't up far enough */ UpdateControlFile(); + /* initialize shared-memory copy of latest checkpoint XID/epoch */ + XLogCtl-ckptXidEpoch = ControlFile-checkPointCopy.nextXidEpoch; + XLogCtl-ckptXid = ControlFile-checkPointCopy.nextXid; + /* initialize our local copy of minRecoveryPoint */ minRecoveryPoint = ControlFile-minRecoveryPoint; @@ -6915,10 +6919,6 @@ StartupXLOG(void) /* start the archive_timeout timer running */ XLogCtl-Write.lastSegSwitchTime = (pg_time_t) time(NULL); - /* initialize shared-memory copy of latest checkpoint XID/epoch */ - XLogCtl-ckptXidEpoch = ControlFile-checkPointCopy.nextXidEpoch; - XLogCtl-ckptXid = ControlFile-checkPointCopy.nextXid; - /* also initialize latestCompletedXid, to nextXid - 1 */ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); ShmemVariableCache-latestCompletedXid = ShmemVariableCache-nextXid; @@ -8601,6 +8601,17 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record) ControlFile-checkPointCopy.nextXidEpoch = checkPoint.nextXidEpoch; ControlFile-checkPointCopy.nextXid = checkPoint.nextXid; + /* Update shared-memory copy of checkpoint XID/epoch */ + { + /* use volatile pointer to prevent code rearrangement */ + volatile XLogCtlData *xlogctl = XLogCtl; + + SpinLockAcquire(xlogctl-info_lck); + xlogctl-ckptXidEpoch = checkPoint.nextXidEpoch; + xlogctl-ckptXid = checkPoint.nextXid; + SpinLockRelease(xlogctl-info_lck); + } + /* * TLI may change in a shutdown checkpoint, but it shouldn't decrease */ @@ -8645,6 +8656,17 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record) ControlFile-checkPointCopy.nextXidEpoch = checkPoint.nextXidEpoch; ControlFile-checkPointCopy.nextXid = checkPoint.nextXid; + /* Update shared-memory copy of checkpoint XID/epoch */ + { + /* use volatile pointer to prevent code rearrangement */ + volatile XLogCtlData *xlogctl = XLogCtl; + + SpinLockAcquire(xlogctl-info_lck); + xlogctl-ckptXidEpoch = checkPoint.nextXidEpoch; + xlogctl-ckptXid = checkPoint.nextXid; + SpinLockRelease(xlogctl-info_lck); + } + /* TLI should not change in an on-line checkpoint */ if (checkPoint.ThisTimeLineID != ThisTimeLineID) ereport(PANIC, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
On Thu, Mar 29, 2012 at 9:01 AM, Thom Brown t...@linux.com wrote: On 29 March 2012 13:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: I'll go make that happen, and still need input here. We first want to have command triggers on specific commands or ANY command, and we want to implement 3 places from where to fire them. Here's a new syntax proposal to cope with that: create command trigger before COMMAND_STEP of alter table execute procedure snitch(); - before the process utility switch, with only command tag and parse tree create command trigger foo before start of alter table execute procedure snitch(); - before running the command, after having done basic error checks, security checks, name lookups and locking, with all information create command trigger before execute of alter table execute procedure snitch(); - after having run the command create command trigger foo before end of alter table execute procedure snitch(); Is it necessary to add this complexity in this version? Can't we keep it simple but in a way that allows the addition of this later? The testing of all these new combinations sounds like a lot of work. I concur. This is way more complicated than we should be trying to do in version 1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standbys, txid_current_snapshot, wraparound
On Thu, Mar 29, 2012 at 02:46:23PM +0100, Simon Riggs wrote: On Thu, Mar 29, 2012 at 12:06 PM, Simon Riggs si...@2ndquadrant.com wrote: Patch coming in a few hours. This is more straightforward than I was thinking. We just need to initialise XLogCtl at the right place. Looks good to me. That should fix the problem with standbys then. Next question: how can flipping archive_mode on and off, with restarts, near wraparound point, break epoch on master? http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001609.html Any ideas? Could WAL playback from backend with different archive_mode setting cause it somehow? -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standbys, txid_current_snapshot, wraparound
On Thu, Mar 29, 2012 at 3:04 PM, Marko Kreen mark...@gmail.com wrote: On Thu, Mar 29, 2012 at 02:46:23PM +0100, Simon Riggs wrote: On Thu, Mar 29, 2012 at 12:06 PM, Simon Riggs si...@2ndquadrant.com wrote: Patch coming in a few hours. This is more straightforward than I was thinking. We just need to initialise XLogCtl at the right place. Looks good to me. That should fix the problem with standbys then. Next question: how can flipping archive_mode on and off, with restarts, near wraparound point, break epoch on master? http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001609.html Any ideas? Could WAL playback from backend with different archive_mode setting cause it somehow? The current code would allow it, if you managed to go for 4 billion xids between checkpoints. If that happened, it would give an off by one error each time it happened. I don't believe that is possible. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standbys, txid_current_snapshot, wraparound
On Thu, Mar 29, 2012 at 03:23:01PM +0100, Simon Riggs wrote: On Thu, Mar 29, 2012 at 3:04 PM, Marko Kreen mark...@gmail.com wrote: Next question: how can flipping archive_mode on and off, with restarts, near wraparound point, break epoch on master? http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001609.html Any ideas? Could WAL playback from backend with different archive_mode setting cause it somehow? The current code would allow it, if you managed to go for 4 billion xids between checkpoints. If that happened, it would give an off by one error each time it happened. I don't believe that is possible. Yes, that does not sound like probable case. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] HTTP Frontend? (and a brief thought on materialized views)
Hi guys, Something from Josh's recent blog post about summer of code clicked with me - the HTTP / SQL concept. It was something I'd been thinking about earlier, how people really like HTTP APIs and this is one of the drivers behind adoption of some NoSQL databases out there. Some things that I think are great about NoSQL that are missing in PostgreSQL are: 1. The reduce function which could be modeled as a kind of materialized view with an index on it. With materialized views and the ability to pull fields from JSON and XML data structures you could easily do a NoSQL database inside of PostgreSQL by saving the document as a big string and then using materialized views to have all kinds of derived data from those documents. Sounds cool on paper, anyway. 2. HTTP RESTful API. This is obviously not as useful as a good admin tool like pgAdmin or a fast binary protocol like we have now but it's way more buzzword compliant and would come in handy sometimes. With CouchDB I was able to allow public data in the database to be accessed directly from the browser using JSONP and/or a simple HTTP proxy in the server instead of doing any of that work within the app server. So, that saves a step somewhere. With some basic support for stored procedures and serving files directly via this HTTP thing you could potentially eliminate the app server for public, read-only parts of a site/application. 3. The perception of extremely low latency and/or high throughput - like fetching a row in less than 1ms or whatever. I don't know the exact numbers I just know they have to be insanely low (latency) or impressively high (throughput). We could use PostgreSQL as a query cache to accelerate your MySQL! Something like that :-P. 4. Rebelliousness against the man. In our case SQL can't be the man, but we can find something PostgreSQL isn't and position against that. Anyway, 12 seem inspiring to me and I'd love to help out with both. 34 seem more like marketing tasks of some sort... I think I could probably start hacking up an HTTP API of some sort, I wasn't able to log into the PostgreSQL Wiki so I created a page with some notes here: http://dobesv.com/docs/postgresql-http-api.html For materialized views I think that can be a bit of a can of worms, especially if you want to do incremental updates of some sort because you have to figure out what rows need to be recalculated when you update a row in a source table, and how best to update them. But if someone thinks they know how I can assist in implementation. Anyway, I hope I can help AND that I posted this in the correct mailing list! Cheers, Dobes
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On 03/29/2012 10:37 AM, Dobes Vandermeer wrote: Hi guys, Something from Josh's recent blog post about summer of code clicked with me - the HTTP / SQL concept. It was something I'd been thinking about earlier, how people really like HTTP APIs and this is one of the drivers behind adoption of some NoSQL databases out there. Some things that I think are great about NoSQL that are missing in PostgreSQL are: 1. The reduce function which could be modeled as a kind of materialized view with an index on it. With materialized views and the ability to pull fields from JSON and XML data structures you could easily do a NoSQL database inside of PostgreSQL by saving the document as a big string and then using materialized views to have all kinds of derived data from those documents. Sounds cool on paper, anyway. 2. HTTP RESTful API. This is obviously not as useful as a good admin tool like pgAdmin or a fast binary protocol like we have now but it's way more buzzword compliant and would come in handy sometimes. With CouchDB I was able to allow public data in the database to be accessed directly from the browser using JSONP and/or a simple HTTP proxy in the server instead of doing any of that work within the app server. So, that saves a step somewhere. With some basic support for stored procedures and serving files directly via this HTTP thing you could potentially eliminate the app server for public, read-only parts of a site/application. 3. The perception of extremely low latency and/or high throughput - like fetching a row in less than 1ms or whatever. I don't know the exact numbers I just know they have to be insanely low (latency) or impressively high (throughput). We could use PostgreSQL as a query cache to accelerate your MySQL! Something like that :-P. 4. Rebelliousness against the man. In our case SQL can't be the man, but we can find something PostgreSQL isn't and position against that. Anyway, 12 seem inspiring to me and I'd love to help out with both. 34 seem more like marketing tasks of some sort... I think I could probably start hacking up an HTTP API of some sort, I wasn't able to log into the PostgreSQL Wiki so I created a page with some notes here: http://dobesv.com/docs/postgresql-http-api.html For materialized views I think that can be a bit of a can of worms, especially if you want to do incremental updates of some sort because you have to figure out what rows need to be recalculated when you update a row in a source table, and how best to update them. But if someone thinks they know how I can assist in implementation. Anyway, I hope I can help AND that I posted this in the correct mailing list! 1. I've been in discussion with some people about adding simple JSON extract functions. We already have some (i.e. xpath()) for XML. 2. You might find htsql http://htsql.org/ interesting. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Robert Haas robertmh...@gmail.com writes: On Wed, Mar 28, 2012 at 10:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: The SELECT INTO tests all fail, but we know the reason why (the testbed isn't expecting them to result in creating separate entries for the utility statement and the underlying plannable SELECT). This might be a dumb idea, but for a quick hack, could we just rig SELECT INTO, CREATE TABLE AS, and EXPLAIN not to create entries for themselves at all, without suppressing creation of an entry for the underlying query? It would make more sense to me to go the other way, that is suppress creation of a separate entry for the contained optimizable statement. The stats will still be correctly accumulated into the surrounding statement (or at least, if they are not, that's a separate pre-existing bug). If we do it in the direction you suggest, we'll fail to capture costs incurred outside execution of the contained statement. Right now, we already have logic in there to track nesting of statements in a primitive way, that is just count the nesting depth. My first idea about fixing this was to tweak that logic so that it stacks a flag saying we're in a utility statement that contains an optimizable statement, and then the first layer of Executor hooks that sees that flag set would know to not do anything. However this isn't quite good enough because that first layer might not be for the same statement. As an example, in an EXPLAIN ANALYZE the planner might pre-execute immutable or stable SQL functions before we reach the executor. We would prefer that any statements embedded in such a function still be seen as independent nested statements. However, I think there is a solution for that, though it may sound a bit ugly. Rather than just stacking a flag, let's stack the query source text pointer for the utility statement. Then in the executor hooks, if that pointer is *pointer* equal (not strcmp equal) to the optimizable statement's source-text pointer, we know we are executing the same statement as the surrounding utility command, and we do nothing. This looks like it would work for the SELECT INTO and EXPLAIN cases, and for DECLARE CURSOR whenever that gets changed to a less bizarre structure. It would not work for EXECUTE, because in that case we pass the query string saved from PREPARE to the executor. However, we could possibly do a special-case hack for EXECUTE; maybe ask prepare.c for the statement's string and stack that instead of the outer EXECUTE query string. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
On Thu, Mar 29, 2012 at 8:30 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: I'll go make that happen, and still need input here. We first want to have command triggers on specific commands or ANY command, and we want to implement 3 places from where to fire them. Here's a new syntax proposal to cope with that: create command trigger before COMMAND_STEP of alter table execute procedure snitch(); - before the process utility switch, with only command tag and parse tree create command trigger foo before start of alter table execute procedure snitch(); - before running the command, after having done basic error checks, security checks, name lookups and locking, with all information create command trigger before execute of alter table execute procedure snitch(); - after having run the command create command trigger foo before end of alter table execute procedure snitch(); One thought is that it might be better to say AT or ON or WHEN rather than BEFORE, since BEFORE END is just a little strange; and also because a future hook point might be something like permissions-checking, and we really want it to be AT permissions-checking time, not BEFORE permissions-checking time. If I got to pick, I'd pick this syntax: CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...]) EXECUTE PROCEDURE function_name(args); Then you could eventually imagine: CREATE EVENT TRIGGER prohibit_all_ddl ON ddl_command_start EXECUTE PROCEDURE throw_an_error(); CREATE EVENT TRIGGER prohibit_some_ddl ON ddl_command_start (alter_table) EXECUTE PROCEDURE throw_an_error(); CREATE EVENT TRIGGER replicate_table_drops ON sql_drop (table) EXECUTE PROCEDURE record_table_drop(); CREATE EVENT TRIGGER allow_whitelisted_extensions ON permissions_check (create_extension) EXECUTE PROCEDURE make_heroku_happy(); CREATE EVENT TRIGGER replicate_at_subcommands ON subcommand_start (alter_table) EXECUTE PROCEDURE record_alter_table_subcommand(); Piece by piece: - I prefer EVENT to COMMAND, because the start or end or any other point during the execution of a command can be viewed as an event; however, it is pretty clear that not everything that can be viewed as an event upon which we might like triggers to fire can be viewed as a command, even if we have a somewhat well-defined notion of subcommands. I continue to think that there is no sense in which creating a sequence to back a serial column or dropping an object due to DROP .. CASCADE is a subcommand; it is, however, pretty clearly an event. Anything we want can be an event. I think if we don't get this right in the first version we're going to be stuck with a misnomer forever. :-( - It is pretty clear that for triggers that are supposed to fire at the start or end of a command, it's useful to be able to specify which commands it fires for. However, there must be other types of events (as in my postulated sql_drop case above) where the name of the command is not relevant - people aren't going to find all the objects that got dropped as a result of a toplevel drop table command; they're going to want to find all the tables that got dropped regardless of which incantation did it. Also keep in mind that you can do things like use ALTER TABLE to rename a view (and there are other cases of that sort of confusion with domains vs. types, aggregates vs. functions, etc), so being able to filter based on object type seems clearly better in a bunch of real-world cases. And, even if you don't believe that specific example, a general syntax leaves us with freedom to maneuver if we discover other needs in the future. Writing alter_table etc. as tokens rather than as ALTER TABLE also has the further advantages of (1) greatly decreasing the parser footprint of this feature and (2) relieving everyone who adds commands in the future of the need to also change the command-trigger grammar. - I don't think that triggers at what Dimitri is calling before execute are a good idea at all for the first version of this feature, because there is a lot of stuff that might break and examining that should really be a separate project from getting the basic infrastructure in place. However, I also don't like the name. Unlike queries, DDL commands don't have clearly separate planning and execution phases, so saying that something happens before execution isn't really very clear. As previously discussed, the hook points in the latest patch are not all entirely consistent between one command and the next, not only in the larger ways I've already pointed out but also in some smaller ways. Different commands do different kinds of integrity checks and the firing points are not always in exactly the same place in that sequence. Of course, not all the commands do all the checks in the same order, so some possibly-not-trivial refactoring is probably required to get that consistent. Moreover, it doesn't seem out of reach to
Re: [HACKERS] Command Triggers patch v18
Robert Haas robertmh...@gmail.com writes: On Thu, Mar 29, 2012 at 9:01 AM, Thom Brown t...@linux.com wrote: On 29 March 2012 13:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: I'll go make that happen, and still need input here. We first want to have command triggers on specific commands or ANY command, and we want to implement 3 places from where to fire them. Here's a new syntax proposal to cope with that: Is it necessary to add this complexity in this version? Can't we keep it simple but in a way that allows the addition of this later? The testing of all these new combinations sounds like a lot of work. I concur. This is way more complicated than we should be trying to do in version 1. I'm at a loss here. This proposal was so that we can reach a commonly agreed minimal solution and design in first version. There's no new piece of infrastructure to add, the syntax is changed only to open the road for later, I'm not changing where the current command triggers are to be called (except for those which are misplaced). So, please help me here: what do we want to have in 9.3? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Thu, Mar 29, 2012 at 11:23 AM, Tom Lane t...@sss.pgh.pa.us wrote: It would make more sense to me to go the other way, that is suppress creation of a separate entry for the contained optimizable statement. The stats will still be correctly accumulated into the surrounding statement (or at least, if they are not, that's a separate pre-existing bug). If we do it in the direction you suggest, we'll fail to capture costs incurred outside execution of the contained statement. All things being equal, I completely agree. However, ISTM that the difficulty of implementation might be higher for your proposal, for the reasons you go on to state. If getting it right means that other significant features are not going to get committed at all for 9.2, I think we could leave this as a TODO. Right now, we already have logic in there to track nesting of statements in a primitive way, that is just count the nesting depth. My first idea about fixing this was to tweak that logic so that it stacks a flag saying we're in a utility statement that contains an optimizable statement, and then the first layer of Executor hooks that sees that flag set would know to not do anything. However this isn't quite good enough because that first layer might not be for the same statement. As an example, in an EXPLAIN ANALYZE the planner might pre-execute immutable or stable SQL functions before we reach the executor. We would prefer that any statements embedded in such a function still be seen as independent nested statements. However, I think there is a solution for that, though it may sound a bit ugly. Rather than just stacking a flag, let's stack the query source text pointer for the utility statement. Then in the executor hooks, if that pointer is *pointer* equal (not strcmp equal) to the optimizable statement's source-text pointer, we know we are executing the same statement as the surrounding utility command, and we do nothing. Without wishing to tick you off, that sounds both ugly and fragile. Can't we find a way to set the stacked flag (on the top stack frame) after planning and before execution? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Robert Haas robertmh...@gmail.com writes: On Thu, Mar 29, 2012 at 11:23 AM, Tom Lane t...@sss.pgh.pa.us wrote: However, I think there is a solution for that, though it may sound a bit ugly. Rather than just stacking a flag, let's stack the query source text pointer for the utility statement. Then in the executor hooks, if that pointer is *pointer* equal (not strcmp equal) to the optimizable statement's source-text pointer, we know we are executing the same statement as the surrounding utility command, and we do nothing. Without wishing to tick you off, that sounds both ugly and fragile. What do you object to --- the pointer-equality part? We could do strcmp comparison instead, on the assumption that a utility command could not look the same as an optimizable statement except in the case we care about. I think that's probably unnecessary though. Can't we find a way to set the stacked flag (on the top stack frame) after planning and before execution? That would require a way for pg_stat_statements to get control at rather random places in several different types of utility statements. And if we did add hook functions in those places, you'd still need to have sufficient stacked context for those hooks to know what to do, which leads you right back to this I think. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
On 29 March 2012 16:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Mar 29, 2012 at 9:01 AM, Thom Brown t...@linux.com wrote: On 29 March 2012 13:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: I'll go make that happen, and still need input here. We first want to have command triggers on specific commands or ANY command, and we want to implement 3 places from where to fire them. Here's a new syntax proposal to cope with that: Is it necessary to add this complexity in this version? Can't we keep it simple but in a way that allows the addition of this later? The testing of all these new combinations sounds like a lot of work. I concur. This is way more complicated than we should be trying to do in version 1. I'm at a loss here. This proposal was so that we can reach a commonly agreed minimal solution and design in first version. There's no new piece of infrastructure to add, the syntax is changed only to open the road for later, I'm not changing where the current command triggers are to be called (except for those which are misplaced). So, please help me here: what do we want to have in 9.3? Perhaps I misunderstood. It was the addition of the fine-grained even options (parse, execute etc) I saw as new. If you're saying those are just possible options for later that won't be in this version, I'm fine with that. If those are to make it for 9.2, then creating the necessary test cases and possible fixes sounds infeasible in such a short space of time. Please disregard if this is not the case. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] poll: CHECK TRIGGER?
On 28.03.2012 23:54, Pavel Stehule wrote: 2012/3/28 Heikki Linnakangasheikki.linnakan...@enterprisedb.com: In prepare_expr(), you use a subtransaction to catch any ERRORs that happen during parsing the expression. That's a good idea, and I think many of the check_* functions could be greatly simplified by adopting a similar approach. Just ereport() any errors you find, and catch them at the appropriate level, appending the error to the output string. Your current approach of returning true/false depending on whether there was any errors seems tedious. This is not possible, when we would to enable fatal_errors = false checking. I can do subtransaction in prepare_expr, because it is the most deep level, but I cannot to use it elsewhere, because I cannot handle exception and continue with other parts of statement. Well, you can continue on the next statement. That's almost as good. In practice, if there's one error in a statement, it seems unlikely that you would correctly diagnose other errors on the same line. They're more likely to be fallout of the same mistake. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Thu, Mar 29, 2012 at 11:42 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Mar 29, 2012 at 11:23 AM, Tom Lane t...@sss.pgh.pa.us wrote: However, I think there is a solution for that, though it may sound a bit ugly. Rather than just stacking a flag, let's stack the query source text pointer for the utility statement. Then in the executor hooks, if that pointer is *pointer* equal (not strcmp equal) to the optimizable statement's source-text pointer, we know we are executing the same statement as the surrounding utility command, and we do nothing. Without wishing to tick you off, that sounds both ugly and fragile. What do you object to --- the pointer-equality part? We could do strcmp comparison instead, on the assumption that a utility command could not look the same as an optimizable statement except in the case we care about. I think that's probably unnecessary though. The pointer equality part seems like the worst ugliness, yes. Can't we find a way to set the stacked flag (on the top stack frame) after planning and before execution? That would require a way for pg_stat_statements to get control at rather random places in several different types of utility statements. And if we did add hook functions in those places, you'd still need to have sufficient stacked context for those hooks to know what to do, which leads you right back to this I think. What I'm imagining is that instead of just having a global for nested_level, you'd have a global variable pointing to a linked list. The length of the list would be equal to what we currently call nested_level + 1. Something like this: struct pgss_nesting_info { struct pgss_nesting_info *next; int flag; /* bad name */ }; static pgss_nesting_info *pgss_stack_top; So any test for nesting_depth == 0 would instead test pgss_stack_top-next != NULL. Then, when you get control at the relevant spots, you set pgss_stack_top-flag = 1 and that's it. Now, maybe it's too ugly to think about passing control at those spots; I'm surprised there's not a central point they all go through... Another thought is: if we simply treated these as nested queries for all purposes, would that really be so bad? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Robert Haas robertmh...@gmail.com writes: What I'm imagining is that instead of just having a global for nested_level, you'd have a global variable pointing to a linked list. This is more or less what I have in mind, too, except I do not believe that a mere boolean flag is sufficient to tell the difference between an executor call that you want to suppress logging for and one that you do not. You need some more positive way of identifying the target statement than that, and what I propose that be is the statement's query string. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
Robert Haas robertmh...@gmail.com writes: create command trigger before COMMAND_STEP of alter table execute procedure snitch(); One thought is that it might be better to say AT or ON or WHEN rather than BEFORE, since BEFORE END is just a little strange; and also because a future hook point might be something like permissions-checking, and we really want it to be AT permissions-checking time, not BEFORE permissions-checking time. Yeah I tried different spellings and almost sent a version using AT or WHEN, but it appeared to me not to be specific enough: AT permission checking time does not exist, it either happens before or after permission checking. I played with using “preceding” rather than before, too, maybe you would like that better. So I would document the different steps and their ordering, then use only BEFORE as a qualifier, and add a pseudo step which is the end of the command. If I got to pick, I'd pick this syntax: CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...]) EXECUTE PROCEDURE function_name(args); Then you could eventually imagine: CREATE EVENT TRIGGER prohibit_all_ddl ON ddl_command_start EXECUTE PROCEDURE throw_an_error(); CREATE EVENT TRIGGER prohibit_some_ddl ON ddl_command_start (alter_table) EXECUTE PROCEDURE throw_an_error(); CREATE EVENT TRIGGER replicate_table_drops ON sql_drop (table) EXECUTE PROCEDURE record_table_drop(); CREATE EVENT TRIGGER allow_whitelisted_extensions ON permissions_check (create_extension) EXECUTE PROCEDURE make_heroku_happy(); CREATE EVENT TRIGGER replicate_at_subcommands ON subcommand_start (alter_table) EXECUTE PROCEDURE record_alter_table_subcommand(); I would amend that syntax to allow for a WHEN expr much like in the DML trigger case, where the expression can play with the variables exposed at the time of calling the trigger. create event trigger prohibit_some_ddl preceding timing spec when tag in ('CREATE TABLE', 'ALTER TABLE') execute procedure throw_an_error(); We must also take some time to describe the timing specs carefully, their order of operation and which information are available for triggers in each of them. See below. I also don't like the way you squeeze in the drop cascade support here by re qualifying it as a timing spec, which it's clearly not. On subcommand_start, what is the tag set to? the main command or the subcommand? if the former, where do you find the current subcommand tag? I don't think subcommand_start should be a timing spec either. Piece by piece: - I prefer EVENT to COMMAND, because the start or end or any other point during the execution of a command can be viewed as an event; however, it is pretty clear that not everything that can be viewed as an event upon which we might like triggers to fire can be viewed as a command, even if we have a somewhat well-defined notion of subcommands. I continue to think that there is no sense in which creating a sequence to back a serial column or dropping an object due to DROP .. CASCADE is a subcommand; it is, however, pretty clearly an event. Anything we want can be an event. I think if we don't get this right in the first version we're going to be stuck with a misnomer forever. :-( I can buy naming what we're building here EVENT TRIGGERs, albeit with some caveats: what I implemented here only caters for commands for which we have a distinct parse tree. That's what sub commands are in effect and why it supports create schema sub commands but not all alter table declination, and not cascaded drops. Also, implementing cascade operations as calling process utility with a complete parse tree is not going to be a little project on its own. We can also punt that and document that triggers fired on cascaded drops are not receiving a parse tree, only the command tag and object id and object name and schema name, or even just the command tag and object id in order not to incur too many additional name lookups. Lastly, EVENT opens the road to thinking about transaction event triggers, on begin|commit|rollback etc. Do we want to take that road? - It is pretty clear that for triggers that are supposed to fire at the start or end of a command, it's useful to be able to specify which commands it fires for. However, there must be other types of events (as in my postulated sql_drop case above) where the name of the command is not relevant - people aren't going to find all the objects that got dropped as a result of a toplevel drop table command; they're going to want to find all the tables that got dropped regardless of which incantation did it. Also keep in mind that you can do things like use ALTER TABLE to rename a view (and there are other cases of that sort of confusion with domains vs. types, aggregates vs. functions, etc), so being able to filter based on object type seems clearly better in a bunch of real-world cases. And, even if you don't believe that specific
Re: [HACKERS] Command Triggers patch v18
On Thu, Mar 29, 2012 at 11:49 AM, Thom Brown t...@linux.com wrote: On 29 March 2012 16:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Mar 29, 2012 at 9:01 AM, Thom Brown t...@linux.com wrote: On 29 March 2012 13:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: I'll go make that happen, and still need input here. We first want to have command triggers on specific commands or ANY command, and we want to implement 3 places from where to fire them. Here's a new syntax proposal to cope with that: Is it necessary to add this complexity in this version? Can't we keep it simple but in a way that allows the addition of this later? The testing of all these new combinations sounds like a lot of work. I concur. This is way more complicated than we should be trying to do in version 1. I'm at a loss here. This proposal was so that we can reach a commonly agreed minimal solution and design in first version. There's no new piece of infrastructure to add, the syntax is changed only to open the road for later, I'm not changing where the current command triggers are to be called (except for those which are misplaced). So, please help me here: what do we want to have in 9.3? Perhaps I misunderstood. It was the addition of the fine-grained even options (parse, execute etc) I saw as new. If you're saying those are just possible options for later that won't be in this version, I'm fine with that. If those are to make it for 9.2, then creating the necessary test cases and possible fixes sounds infeasible in such a short space of time. Please disregard if this is not the case. So... I've said repeatedly and over a long period of time that development of this feature wasn't started early enough in the cycle to get it finished in time for 9.2. I think that I've identified some pretty serious issues in the discussion we've had so far, especially (1) the points at which figures are fired aren't consistent between commands, (2) not much thought has been given to what happens if, say, a DDL trigger performs a DDL operation on the table the outer DDL command is due to modify, and (3) we are eventually going to want to trap a much richer set of events than can be captured by the words before and after. Now, you could view this as me throwing up roadblocks to validate my previously-expressed opinion that this wasn't going to get done, but I really, honestly believe that these are important issues and that getting them right is more important than getting something done now. If we still want to try to squeeze something into 9.2, I recommend stripping out everything except for what Dimitri called the before-any-command firing point. In other words, add a way to run a procedure after parsing of a command but before any name lookups have been done, any permissions checks have been done, or any locks have been taken. The usefulness of such a hook is of course limited but it is also a lot less invasive than the patch I recently reviewed and probably a lot safer. I actually think it's wise to do that as a first step even if it doesn't make 9.2, because it is much easier to build features like this incrementally and even a patch that does that will be reasonably complicated and difficult to review. Parenthetically, what Dimitri previously called the after-any-command firing point, all the way at the end of the statement but without any specific details about the object the statement operated on, seems just as good for a first step, maybe better, so that would be a fine foundation from my point of view as well. The stuff that happens somewhere in the middle, even just after locking and permissions checking, is more complex and I think that should be phase 2 regardless of which phase ends up in which release cycle. I'm not sure whether Dimitri's question about 9.3 was a typo for 9.2 or whether that's what he was actually asking about. But to answer that question for 9.3, I think we have time to build an extremely extensive infrastructure that covers a huge variety of use cases, and I think there is hardly any reasonable proposal that is off the table as far as that goes. We have a year to get that work done, and a year is a long time, and with incremental progress at each CommitFest we can do a huge amount. I can also say that I would be willing to put in some serious work during the 9.3 cycle to accelerate that progress, too, especally if (ahem) some other people can return the favor by reviewing my patches. :-) I could see us adding the functionality described above in one CommitFest and then spending the next three adding more whiz-bango frammishes and ending up with something really nice. Right now, though, we are very crunched for time, and probably shouldn't be entertaining anything that requires a tenth the code churn that this patch probably does; if we are going to do anything at all, it had better be as simple and uninvasive as we
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
[ forgot to respond to this bit ] Robert Haas robertmh...@gmail.com writes: Another thought is: if we simply treated these as nested queries for all purposes, would that really be so bad? That was actually what I suggested first, and now that I look at the code, that's exactly what's happening right now. However, Peter pointed out --- I think rightly --- that this fails to satisfy the principle of least astonishment, because the user doesn't think of these statements as being utility statements with other statements wrapped inside. Users are going to expect these cases to be treated as single statements. The issue existed before this patch, BTW, but was partially masked by the fact that we grouped pg_stat_statements view entries strictly by query text, so that both the utility statement and the contained optimizable statement got matched to the same table entry. The execution costs were getting double-counted, but apparently nobody noticed that. As things now stand, the utility statement and contained statement show up as distinct table entries (if you have nested-statement tracking enabled) because of the different hashing methods used. And it's those multiple table entries that seem confusing, especially since they are counting mostly the same costs. [ time passes ... ] Hm ... I just had a different idea. I need to go look at the code again, but I believe that in the problematic cases, the post-analyze hook does not compute a queryId for the optimizable statement. This means that it will arrive at the executor with queryId zero. What if we simply made the executor hooks do nothing when queryId is zero? (Note that this also means that in the problematic cases, the behavior is already pretty wrong because executor costs for *all* statements of this sort are getting merged into one hashtable entry for hash zero.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
Robert Haas robertmh...@gmail.com writes: I've said repeatedly and over a long period of time that development of this feature wasn't started early enough in the cycle to get it finished in time for 9.2. I think that I've identified some pretty That could well be, yeah. serious issues in the discussion we've had so far, especially (1) the points at which figures are fired aren't consistent between commands, (2) not much thought has been given to what happens if, say, a DDL trigger performs a DDL operation on the table the outer DDL command is due to modify, and (3) we are eventually going to want to trap a much richer set of events than can be captured by the words before and after. Now, you could view this as me throwing up roadblocks to validate my previously-expressed opinion that this wasn't going to get done, but I really, honestly believe that these are important issues and that getting them right is more important than getting something done now. (1) is not hard to fix as soon as we set a policy, which the most pressing thing we need to do in my mind, whatever we manage to commit in 9.2. (2) can be addressed with a simple blacklist that would be set just before calling user defined code, and cleaned when done running it. When in place the blacklist lookup is easy to implement in a central place (utility.c or cmdtrigger.c) and ereport() when the current command is in the blacklist. e.g. alter table would blacklist alter table and drop table commands on the current object id (3) we need to continue designing that, yes. I think we can have a first set of events defined now, even if we don't implement support for all of them readily. If we still want to try to squeeze something into 9.2, I recommend stripping out everything except for what Dimitri called the before-any-command firing point. In other words, add a way to run a I would like that we can make that consistent rather than throw it, or maybe salvage a part of the command we support here. It's easy enough if boresome to document which commands are supported in which event timing. procedure after parsing of a command but before any name lookups have been done, any permissions checks have been done, or any locks have been taken. The usefulness of such a hook is of course limited but it is also a lot less invasive than the patch I recently reviewed and probably a lot safer. I actually think it's wise to do that as a first step even if it doesn't make 9.2, because it is much easier to build features like this incrementally and even a patch that does that will be reasonably complicated and difficult to review. Yes. Parenthetically, what Dimitri previously called the after-any-command firing point, all the way at the end of the statement but without any specific details about the object the statement operated on, seems just as good for a first step, maybe better, so that would be a fine foundation from my point of view as well. The stuff that happens Now that fetching the information is implemented, I guess that we could still provide for it when firing event trigger at that timing spec. Of course that means a bigger patch to review when compared to not having the feature, but Thom did spend loads of time to test this part of the implementation. somewhere in the middle, even just after locking and permissions checking, is more complex and I think that should be phase 2 regardless of which phase ends up in which release cycle. Mmmm, ok. I'm not sure whether Dimitri's question about 9.3 was a typo for 9.2 Typo :) I appreciate your offer, though :) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] poll: CHECK TRIGGER?
2012/3/29 Heikki Linnakangas heikki.linnakan...@enterprisedb.com: On 28.03.2012 23:54, Pavel Stehule wrote: 2012/3/28 Heikki Linnakangasheikki.linnakan...@enterprisedb.com: In prepare_expr(), you use a subtransaction to catch any ERRORs that happen during parsing the expression. That's a good idea, and I think many of the check_* functions could be greatly simplified by adopting a similar approach. Just ereport() any errors you find, and catch them at the appropriate level, appending the error to the output string. Your current approach of returning true/false depending on whether there was any errors seems tedious. This is not possible, when we would to enable fatal_errors = false checking. I can do subtransaction in prepare_expr, because it is the most deep level, but I cannot to use it elsewhere, because I cannot handle exception and continue with other parts of statement. Well, you can continue on the next statement. That's almost as good. In practice, if there's one error in a statement, it seems unlikely that you would correctly diagnose other errors on the same line. They're more likely to be fallout of the same mistake. no, for example, it means, if I found error in if_then statements, still I would to check else statements. Regards Pavel -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgxs and bison, flex
On ons, 2012-03-28 at 22:12 -0400, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: I propose that we apply the attached patch to make sure those variables are set to a usable default value in any case. Won't this break usages such as in contrib/cube? cubeparse.c: cubeparse.y ifdef BISON $(BISON) $(BISONFLAGS) -o $@ $ else @$(missing) bison $ $@ endif No, the code in my patch is conditional on 'ifdef PGXS'. (Not visible in the patch, unfortunately.) I don't think we want to support external use of $(missing), since the text it refers to is specific to PostgreSQL's distribution mechanisms. IMO, people building distribution packages in clean chroots are best advised to include bison and flex even if they're not strictly necessary. Otherwise the build could fall over unexpectedly and unnecessarily, depending on file timestamps and other phase-of-the-moon issues. If the package maker has adopted that elementary bit of self-defense, the whole thing is a non problem. I don't agree with that. If this were a problem, dozens of packages would be liable to break randomly, and this knowledge would have passed into best practices and policies decades ago. In any case, it won't really help because we can't enforce it, and we can't tell the average person who installs from source to install additional build dependencies that the installation instructions explicitly say are not needed. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgxs and bison, flex
Peter Eisentraut pete...@gmx.net writes: On ons, 2012-03-28 at 22:12 -0400, Tom Lane wrote: Won't this break usages such as in contrib/cube? No, the code in my patch is conditional on 'ifdef PGXS'. (Not visible in the patch, unfortunately.) Oh, okay. I don't think we want to support external use of $(missing), since the text it refers to is specific to PostgreSQL's distribution mechanisms. Fair enough. Objection withdrawn. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
On Thu, Mar 29, 2012 at 12:17 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: create command trigger before COMMAND_STEP of alter table execute procedure snitch(); One thought is that it might be better to say AT or ON or WHEN rather than BEFORE, since BEFORE END is just a little strange; and also because a future hook point might be something like permissions-checking, and we really want it to be AT permissions-checking time, not BEFORE permissions-checking time. Yeah I tried different spellings and almost sent a version using AT or WHEN, but it appeared to me not to be specific enough: AT permission checking time does not exist, it either happens before or after permission checking. I played with using “preceding” rather than before, too, maybe you would like that better. Well, preceding and before are synonyms, so I don't see any advantage in that change. But I really did mean AT permissions_checking time, not before or after it. That is, we'd have a hook where instead of doing something like this: aclresult = pg_class_aclcheck(reloid, GetUserId(), ACL_SELECT); ...we'd call a superuser-supplied trigger function and let it make the call. This requires a clear separation of permissions and integrity checking that we are currently lacking, and probably quite a bit of other engineering too, but I think we should assume that we're going to do it at some point because there is it seems pretty clear that there is a huge amount of pent-up demand for being able to do things like this. If I got to pick, I'd pick this syntax: CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...]) EXECUTE PROCEDURE function_name(args); Then you could eventually imagine: CREATE EVENT TRIGGER prohibit_all_ddl ON ddl_command_start EXECUTE PROCEDURE throw_an_error(); CREATE EVENT TRIGGER prohibit_some_ddl ON ddl_command_start (alter_table) EXECUTE PROCEDURE throw_an_error(); CREATE EVENT TRIGGER replicate_table_drops ON sql_drop (table) EXECUTE PROCEDURE record_table_drop(); CREATE EVENT TRIGGER allow_whitelisted_extensions ON permissions_check (create_extension) EXECUTE PROCEDURE make_heroku_happy(); CREATE EVENT TRIGGER replicate_at_subcommands ON subcommand_start (alter_table) EXECUTE PROCEDURE record_alter_table_subcommand(); I would amend that syntax to allow for a WHEN expr much like in the DML trigger case, where the expression can play with the variables exposed at the time of calling the trigger. create event trigger prohibit_some_ddl preceding timing spec when tag in ('CREATE TABLE', 'ALTER TABLE') execute procedure throw_an_error(); We could do it that way, but the tag in ('CREATE TABLE', 'ALTER TABLE') syntax will be tricky to evaluate. I'd really prefer NOT to need to start up and shut down the executor to figure out whether the trigger has to fire. If we are going to do it that way then we need to carefully define what variables are available and what values they get. I think that most people's event-filtering needs will be simple enough to be served by a more declarative syntax. We must also take some time to describe the timing specs carefully, their order of operation and which information are available for triggers in each of them. See below. I also don't like the way you squeeze in the drop cascade support here by re qualifying it as a timing spec, which it's clearly not. On subcommand_start, what is the tag set to? the main command or the subcommand? if the former, where do you find the current subcommand tag? I don't think subcommand_start should be a timing spec either. That's up for grabs, but I was thinking the subcommand. For example, consider: alter table t alter column a add column b int, disable trigger all; I would imagine this having a firing sequence something like this: ddl_command_start:alter_table ddl_command_permissions_check:alter_table ddl_command_name_lookup:alter_table alter_table_subcommand_prep:add_column alter_table_subcommand_prep:disable_trigger_all alter_table_subcommand_start:add_column sql_create:column alter_table_subcommand_end:add_column alter_table_subcommand_start:disable_trigger_all alter_table_subcommand_end:disable_trigger_all ddl_command_end:alter_table Lots of room for argument there, of course. Piece by piece: - I prefer EVENT to COMMAND, because the start or end or any other point during the execution of a command can be viewed as an event; however, it is pretty clear that not everything that can be viewed as an event upon which we might like triggers to fire can be viewed as a command, even if we have a somewhat well-defined notion of subcommands. I continue to think that there is no sense in which creating a sequence to back a serial column or dropping an object due to DROP .. CASCADE is a subcommand; it is, however, pretty clearly an event. Anything we want can be an event. I think if we don't get
Re: [HACKERS] ECPG FETCH readahead
On Thu, Mar 29, 2012 at 12:59:40PM +0200, Boszormenyi Zoltan wrote: 2012-03-29 02:43 keltez?ssel, Noah Misch ?rta: On Sat, Mar 24, 2012 at 10:49:07AM +0100, Boszormenyi Zoltan wrote: +toliteralREADAHEAD number/literal. ExplicitliteralREADAHEAD number/literal or +literalNO READAHEAD/literal turns cursor readahead on (withliteralnumber/literal +as the size of the readahead window) or off for the specified cursor, +respectively. For cursors using a non-0 readahead window size is 256 rows, The number 256 is no longer present in your implementation. Indeed. Oversight, that part of the sentence is deleted. +the window size may be modified by setting theliteralECPGFETCHSZ/literal +environment variable to a different value. I had in mind that DECLARE statements adorned with READAHEAD syntax would always behave precisely as written, independent of ECPGFETCHSZ or ecpg -R. Unadorned DECLARE statements would use ECPGFETCHSZ if set, then the value passed to ecpg -R if provided, and finally a value of 1 (no readahead) in the absence of both ECPGFETCHSZ and ecpg -R. Did you do it differently for a particular reason? I don't particularly object to what you've implemented, but I'd be curious to hear your thinking. What I had in mind was: NO READAHEAD == READAHEAD 0. This will translate into 1 tuple sized readahead window that cannot be modified by ECPGFETCHSZ. READAHEAD 1 also means uncached by default but ECPGFETCHSZ may modify the readahead window size. To me, it feels too magical to make READAHEAD 1 and READAHEAD 0 differ only in whether ECPGFETCHSZ can override. I'm willing to go with whatever consensus arises on this topic, though. This part is policy that can be debated and modified accordingly, it doesn't affect the internals of the caching code. Agreed. +/para + +para +commandUPDATE/command orcommandDELETE/command with the +literalWHERE CURRENT OF/literal clause, cursor readahead may or may not +actually improve performance, ascommandMOVE/command statement has to be +sent to the backend before the DML statement to ensure correct cursor position +in the backend. +/para This sentence seems to be missing a word near its beginning. Sounds like a corner case to me, but I am not a native english speaker. Which one sounds better: UPDATE or DELETE with the WHERE CURRENT OF clause... or UPDATE or DELETE statements with the WHERE CURRENT OF clause... ? Here is the simplest change to make the original grammatical: Given an UPDATE or DELETE with the WHERE CURRENT OF clause, ... I might write the whole thing this way: To execute an UPDATE or DELETE statement bearing the WHERE CURRENT OF clause, ECPG may first execute an implicit MOVE to synchronize the server cursor position with the local cursor position. This can reduce or eliminate the performance benefit of readahead for affected cursors. one of the new sections about readahead should somehow reference the hazard around volatile functions. Done. I don't see the mention in your latest patch. You do mention it for the sqlerrd[2] compatibility stuff. +varlistentry +termliteralOPEN RETURNS LAST ROW POSITION/literal/term +termliteralOPEN RETURNS 0 FOR LAST ROW POSITION/literal/term +listitem +para + When the cursor is opened, it's possible to discover the size of the result set + usingcommandMOVE ALL/command which traverses the result set and move + the cursor back to the beginning usingcommandMOVE ABSOLUTE 0/command. + The size of the result set is returned in sqlca.sqlerrd[2]. +/para + +para + This slows down opening the cursor from the application point of view + but may also have side effects. If the cursor query contains volatile function + calls with side effects, they will be evaluated twice because of traversing + the result set this way duringcommandOPEN/command. +/para + +para + The default is not to discover. +/para I mildly oppose having syntax to enable this per-cursor. Readahead is a generally handy feature that I might use in new programs, but this feature is more of a hack for compatibility with some old Informix version. For new code, authors should do their own MOVEs instead of using this option. The patch also adds an undocumented ECPGOPENRETURNSRESULTSETSIZE environment variable to control this. I see no use for such a thing, because a program's dependency on sqlerrd[2] is fixed at build time. If a program does not reference sqlerrd[2] for a cursor, there's no benefit from choosing at runtime to populate that value anyway. If a program does reference it, any option needed to have it set correctly had better be set at build time and apply to every run of the final program. IOW, I suggest having only the ecpg-time option to enable this behavior. Thoughts? I wanted to make it available
Re: [HACKERS] Potential reference miscounts and segfaults in plpython.c
On Thu, Feb 23, 2012 at 8:35 PM, Daniele Varrazzo daniele.varra...@gmail.com wrote: On Mon, Feb 20, 2012 at 12:09 AM, Jan Urbański wulc...@wulczer.org wrote: BTW, that tool is quite handy, I'll have to try running it over psycopg2. Indeed. I'm having a play with it. It is reporting several issues to clean up (mostly on failure at module import). It's also tracebacking here and there: I'll send the author some feedback/patches. I'm patching psycopg in the gcc-python-plugin branch in my dev repos (https://github.com/dvarrazzo/psycopg). The just released psycopg 2.4.5 has had its codebase cleaned up using the gcc static checker. I haven't managed to run it without false positives, however it's been very useful to find missing return value checks and other nuances. No live memory leak has been found: there were a few missing decrefs but only at module init, not in the live code. Full report about the changes in this message: https://fedorahosted.org/pipermail/gcc-python-plugin/2012-March/000229.html. Cheers, -- Daniele -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
Apropos of nothing and since I haven't found a particularly good time to say this in amidst all the technical discussion, I appreciate very much all the work you've been putting into this. On Thu, Mar 29, 2012 at 12:42 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: serious issues in the discussion we've had so far, especially (1) the points at which figures are fired aren't consistent between commands, (2) not much thought has been given to what happens if, say, a DDL trigger performs a DDL operation on the table the outer DDL command is due to modify, and (3) we are eventually going to want to trap a much richer set of events than can be captured by the words before and after. Now, you could view this as me throwing up roadblocks to validate my previously-expressed opinion that this wasn't going to get done, but I really, honestly believe that these are important issues and that getting them right is more important than getting something done now. (1) is not hard to fix as soon as we set a policy, which the most pressing thing we need to do in my mind, whatever we manage to commit in 9.2. I agree that setting a policy is enormously important, not so sure about how easy it is to fix, but we shall see. My primary and overriding goal here is to make sure that we don't make any design decisions, in the syntax or otherwise, that foreclose the ability to add policies that we've not yet dreamed of to future versions of the code. It seems vanishingly unlikely that the first commit will cover all the use cases that people care about, and only slightly more likely that we'll even be able to list them all at that point. Perhaps I'm wrong and we already know what they are, but I'd like to bet against any of us - and certainly me - being that smart. In terms of policies, there are two that seems to me to be very clear: at the very beginning of ProcessUtility, and at the very end of it, *excluding* recursive calls to ProcessUtility that are intended to handle subcommands. I think we could call the first start or command_start or ddl_command_start or post-parse or pre-locking or something along those lines, and the second end or command_end or ddl_command_end. I think it might be worth including ddl in there because the scope of the current patch really is just DDL (as opposed to say DCL) and having a separate set of firing points for DCL does not seem dumb. If we ever try to do anything with transaction control as you suggested upthread that's likely to also require special handling. Calling it, specifically, ddl_command_start makes the scope very clear. (2) can be addressed with a simple blacklist that would be set just before calling user defined code, and cleaned when done running it. When in place the blacklist lookup is easy to implement in a central place (utility.c or cmdtrigger.c) and ereport() when the current command is in the blacklist. e.g. alter table would blacklist alter table and drop table commands on the current object id Here we're talking about a firing point that we might call ddl_post_name_lookup or something along those lines. I would prefer to handle this by making the following rules - here's I'm assuming that we're talking about the case where the object in question is a relation: 1. The trigger fires immediately after RangeVarGetRelidExtended and before any other checks are performed, and especially before any CheckTableNotInUse(). This last is important, because what matters is whether the table's not in use AFTER all possible user-supplied code is executed. If the trigger opens or closes cursors, for example, what matters is whether there are any cursors left open *after* the triggers complete, not whether there were any open on entering the trigger. The same is true of any other integrity check: if there's a chance that the trigger could change something that affects whether the integrity constraint gets fired, then we'd better be darn sure to fire the trigger before we check the integrity constraint. 2. If we fire any triggers at this point, we CommandCounterIncrement() and then re-do RangeVarGetRelidExtended() with the same arguments we passed before. If it doesn't return the same OID we got the first time, we abort with some kind of serialization error. We need to be a little careful about the phrasing of the error message, because it's possible for this to change during command execution even without command triggers if somebody creates a table earlier in the search path with the same unqualified name that was passed to the command, but I think it's OK for the existence of a command-trigger to cause a spurious abort in that very narrow case, as long as the error message includes some appropriate weasel language. Alternatively, we could try to avoid that corner case by rechecking only that a tuple with the right OID is still present and still has the correct relkind, but that seems like it might be a little less
Re: [HACKERS] Finer Extension dependencies
On Mar 29, 2012, at 4:42 AM, Robert Haas wrote: 2. Add a new feature to the provides line with every release that does anything other than fix bugs, leading to: provides = foobar-1.1, foobar-1.2, foobar-1.3, foobar-1.4, foobar-1.5, foobar-1.6, foobar-2.0, foobar-2.1, foobar-2.2, foobar-2.3, foobar-3.0, foobar-3.1 This is what I have expected to do. In new releases of pgTAP, I’d probably just add version lines. I might give certain releases names, but probably not. I’m too lazy, and if a given release has more than one new feature, it’d be a bit silly. I’ve never been very keen on this approach, but then I don’t understand packaging systems very well, so it might rock, and I just don’t know how to use it properly. But I cannot tell. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
I wrote: Hm ... I just had a different idea. I need to go look at the code again, but I believe that in the problematic cases, the post-analyze hook does not compute a queryId for the optimizable statement. This means that it will arrive at the executor with queryId zero. What if we simply made the executor hooks do nothing when queryId is zero? (Note that this also means that in the problematic cases, the behavior is already pretty wrong because executor costs for *all* statements of this sort are getting merged into one hashtable entry for hash zero.) The attached proposed patch does it that way. It makes the EXPLAIN, SELECT INTO, and DECLARE CURSOR cases behave as expected for utility statements. PREPARE/EXECUTE work a bit funny though: if you have track = all then you get EXECUTE cycles reported against both the EXECUTE statement and the underlying PREPARE. This is because when PREPARE calls parse_analyze_varparams the post-analyze hook doesn't know that this isn't a top-level statement, so it marks the query with a queryId. I don't see any way around that part without something like what I suggested before. However, this behavior seems to me to be considerably less of a POLA violation than the cases involving two identical-looking entries for self-contained statements, and it might even be thought to be a feature not a bug (since the PREPARE entry will accumulate totals for all uses of the prepared statement). So I'm satisfied with it for now. regards, tom lane diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 137b242..bebfff1 100644 *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** pgss_post_parse_analyze(ParseState *psta *** 602,610 if (!pgss || !pgss_hash) return; ! /* We do nothing with utility statements at this stage */ if (query-utilityStmt) return; /* Set up workspace for query jumbling */ jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE); --- 602,620 if (!pgss || !pgss_hash) return; ! /* ! * Utility statements get queryId zero. We do this even in cases where ! * the statement contains an optimizable statement for which a queryId ! * could be derived (such as EXPLAIN or DECLARE CURSOR). For such cases, ! * runtime control will first go through ProcessUtility and then the ! * executor, and we don't want the executor hooks to do anything, since ! * we are already measuring the statement's costs at the utility level. ! */ if (query-utilityStmt) + { + query-queryId = 0; return; + } /* Set up workspace for query jumbling */ jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE); *** pgss_post_parse_analyze(ParseState *psta *** 619,624 --- 629,641 query-queryId = hash_any(jstate.jumble, jstate.jumble_len); /* + * If we are unlucky enough to get a hash of zero, use 1 instead, to + * prevent confusion with the utility-statement case. + */ + if (query-queryId == 0) + query-queryId = 1; + + /* * If we were able to identify any ignorable constants, we immediately * create a hash table entry for the query, so that we can record the * normalized form of the query string. If there were no such constants, *** pgss_ExecutorStart(QueryDesc *queryDesc, *** 649,655 else standard_ExecutorStart(queryDesc, eflags); ! if (pgss_enabled()) { /* * Set up to track total elapsed time in ExecutorRun. Make sure the --- 666,677 else standard_ExecutorStart(queryDesc, eflags); ! /* ! * If query has queryId zero, don't track it. This prevents double ! * counting of optimizable statements that are directly contained in ! * utility statements. ! */ ! if (pgss_enabled() queryDesc-plannedstmt-queryId != 0) { /* * Set up to track total elapsed time in ExecutorRun. Make sure the *** pgss_ExecutorFinish(QueryDesc *queryDesc *** 719,731 static void pgss_ExecutorEnd(QueryDesc *queryDesc) { ! if (queryDesc-totaltime pgss_enabled()) ! { ! uint32 queryId; ! ! /* Query's ID should have been filled in by post-analyze hook */ ! queryId = queryDesc-plannedstmt-queryId; /* * Make sure stats accumulation is done. (Note: it's okay if several * levels of hook all do this.) --- 741,750 static void pgss_ExecutorEnd(QueryDesc *queryDesc) { ! uint32 queryId = queryDesc-plannedstmt-queryId; + if (queryId != 0 queryDesc-totaltime pgss_enabled()) + { /* * Make sure stats accumulation is done. (Note: it's okay if several * levels of hook all do this.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
On Thu, Mar 29, 2012 at 1:48 PM, David E. Wheeler da...@justatheory.com wrote: On Mar 29, 2012, at 4:42 AM, Robert Haas wrote: 2. Add a new feature to the provides line with every release that does anything other than fix bugs, leading to: provides = foobar-1.1, foobar-1.2, foobar-1.3, foobar-1.4, foobar-1.5, foobar-1.6, foobar-2.0, foobar-2.1, foobar-2.2, foobar-2.3, foobar-3.0, foobar-3.1 This is what I have expected to do. In new releases of pgTAP, I’d probably just add version lines. I might give certain releases names, but probably not. I’m too lazy, and if a given release has more than one new feature, it’d be a bit silly. I’ve never been very keen on this approach, but then I don’t understand packaging systems very well, so it might rock, and I just don’t know how to use it properly. But I cannot tell. So the idea is that you're actually supposed to separately catalog each feature you added (e.g. each new function), so that people can depend specifically on those features. Then if you remove the function again in some distant future, you stop advertising that feature (but you can still advertise any other features you added in the same release). If you're not going to do that, then this feature as proposed is strictly worse than figuring out a way to compare version numbers, because it's more work, some people will not bother to update the provides line, and other people will sometimes forget it. I don't really have the foggiest idea how people using other packaging systems handle this. It seems like it would be a huge pain in the rear end to be continually adding Provides: lines to RPMs for every new feature that a new version of a package offers, not to mention that you'd hardly want the corresponding Requires: lines to have to enumerate all the public interfaces those packages used just in case one of them ever went away. I have a feeling I'm missing part of the picture here, somehow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Robert Haas robertmh...@gmail.com writes: provides = foobar-1.1, foobar-1.2, foobar-1.3, foobar-1.4, foobar-1.5, foobar-1.6, foobar-2.0, foobar-2.1, foobar-2.2, foobar-2.3, foobar-3.0, foobar-3.1 This is what I have expected to do. In new releases of pgTAP, I’d probably just add version lines. I might give certain releases names, but probably not. I’m too lazy, and if a given release has more than one new feature, it’d be a bit silly. I’ve never been very keen on this approach, but then I don’t understand packaging systems very well, so it might rock, and I just don’t know how to use it properly. But I cannot tell. So the idea is that you're actually supposed to separately catalog each feature you added (e.g. each new function), so that people can depend specifically on those features. Then if you remove the function again in some distant future, you stop advertising that feature (but you can still advertise any other features you added in the same release). If you're not going to do that, then this feature as proposed is strictly worse than figuring out a way to compare version numbers, because it's more work, some people will not bother to update the provides line, and other people will sometimes forget it. I don't really have the foggiest idea how people using other packaging systems handle this. It seems like it would be a huge pain in the rear end to be continually adding Provides: lines to RPMs for every new feature that a new version of a package offers, not to mention that you'd hardly want the corresponding Requires: lines to have to enumerate all the public interfaces those packages used just in case one of them ever went away. I have a feeling I'm missing part of the picture here, somehow. Basically those examples are far too fine grained. Let's get back to the example of ip4r that I know better. I would imagine those provides lines in there: provides = ip4, ip4r provides = ip4, ip4r, ip4r_gist, ip4r_gin Then when adding ipv6 support and datatypes: provides = ip4, ip4r, ip6, ip6r provides = ip4, ip4r, ip6, ip6r, ip4r_gist, ip4r_gin, ip6r_gist, ip6r_gin Pick any one as what I would consider a realistic example. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Robert Haas robertmh...@gmail.com writes: So the idea is that you're actually supposed to separately catalog each feature you added (e.g. each new function), so that people can depend specifically on those features. I don't really have the foggiest idea how people using other packaging systems handle this. It seems like it would be a huge pain in the rear end to be continually adding Provides: lines to RPMs for every new feature that a new version of a package offers, not to mention that you'd hardly want the corresponding Requires: lines to have to enumerate all the public interfaces those packages used just in case one of them ever went away. I have a feeling I'm missing part of the picture here, somehow. Yeah. AFAIK, nobody actually does that. In my experience with Red Hat packages, so-called virtual Provides (which are exactly equivalent to this proposed feature) are used only for cases where there is or is planned to be more than one package that can supply a given API, and the Provides is really more of a logical package name than an identifier of a feature as such. When people want to depend on a feature that was added after initial release of a package, they invariably use versioned dependencies like Requires: foobar = nnn. And it's also pretty common to use such a dependency because you need to have a bug fixed that was fixed in version nnn; so assuming that you only need feature names for, er, features may be a mistake too. So if you look at common practice, this whole idea is wrong and we ought to define a way to compare version numbers instead. I'm not entirely sure that I want to go there yet, but it certainly bears considering as a time-tested alternative design. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
On ons, 2012-03-28 at 23:00 -0700, Hitoshi Harada wrote: I totally agree with Robert's point that one feature is not standardized and nobody can tell how you can depend on the feature in the end. Mind you, I've never heard about building dependency by its name as a string in other packaging system. If you want to introduce the concept of version dependency not feature name dependency, do *it*; I don't think feature dependency solves it. The Python setuptools (a.k.a. distutils a.k.a. distribute a.k.a. eggs a.k.a. easy_install a.k.a. dont-get-me-started) system supports feature names that a package can provide, but it uses them for a different purpose. The idea is that a package foo can depend on a package bar[somethingextra], and then bar itself would declare it's dependencies such that it depends, say, on ham, but if feature somethingextra is required, it also depends on eggs. This is actually quite useful, but it breaks down when you, say, want to wrap your egg into a Debian package. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
On tor, 2012-03-29 at 09:51 +0200, Dimitri Fontaine wrote: I don't want to introduce version dependency, because I don't think we need it. If you want to compare what we're doing here with say debian packaging, then look at how they package libraries. The major version number is now part of the package name and you depend on that directly. So let's take the shortcut to directly depend on the “feature” name. For a PostgreSQL extension example, we could pick ip4r. That will soon include support for ipv6 (it's already done code wise, missing docs update). If you want to use ip4r for storing ipv6, you will simply require “ip6r” or whatever feature name is provided by the extension including it. Note that in Debian, virtual package names (the ones you give in the Provides line) are effectively centrally managed. Most are specified by the policy, the rest are managed between the affected packages. This rests on the assumption that very little outside packaging that deviates from the official Debian packaging goes on. Since we don't have any central coordinator or authority of that kind, we need to design the system differently. At the very least, I would suggest that feature names are per-extension. So in the above example you could depend on something like ipv4[ipv6]. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ECPG FETCH readahead
On Thu, Mar 29, 2012 at 01:03:41PM -0400, Noah Misch wrote: Still, we're looking at dedicated ECPG syntax, quite visible even to folks with no interest in Informix. We have eschewed littering our syntax with compatibility aids, and I like it that way. IMO, an option to the ecpg preprocessor is an acceptable level of muddle to provide this aid. A DECLARE CURSOR syntax extension goes too far. +1 from me. Let's not add special ecpg sql syntax if we don't absolutely have to. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at googlemail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
On Thu, Mar 29, 2012 at 2:28 PM, Peter Eisentraut pete...@gmx.net wrote: On ons, 2012-03-28 at 23:00 -0700, Hitoshi Harada wrote: I totally agree with Robert's point that one feature is not standardized and nobody can tell how you can depend on the feature in the end. Mind you, I've never heard about building dependency by its name as a string in other packaging system. If you want to introduce the concept of version dependency not feature name dependency, do *it*; I don't think feature dependency solves it. The Python setuptools (a.k.a. distutils a.k.a. distribute a.k.a. eggs a.k.a. easy_install a.k.a. dont-get-me-started) system supports feature names that a package can provide, but it uses them for a different purpose. The idea is that a package foo can depend on a package bar[somethingextra], and then bar itself would declare it's dependencies such that it depends, say, on ham, but if feature somethingextra is required, it also depends on eggs. This is actually quite useful, Wow. That is complex, but I agree that it's useful. but it breaks down when you, say, want to wrap your egg into a Debian package. *blink* Huh? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Peter Eisentraut pete...@gmx.net writes: At the very least, I would suggest that feature names are per-extension. Yeah, I had about come to that conclusion too. A global namespace for them would be a mistake given lack of central coordination. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
On Thu, Mar 29, 2012 at 2:25 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah. AFAIK, nobody actually does that. In my experience with Red Hat packages, so-called virtual Provides (which are exactly equivalent to this proposed feature) are used only for cases where there is or is planned to be more than one package that can supply a given API, and the Provides is really more of a logical package name than an identifier of a feature as such. Note that this feature as designed will NOT service that use-case, as discussed upthread, and in fact it would probably need a full rewrite to do so, because the principle of operation is based on the dependency mechanism, which doesn't support this. In fact, ALTER EXTENSION .. UPDATE as implemented in this patch is already doing some clever tomfoolery to make things work as the user will expect - when you update an extension, it removes the dependencies between the extension and its extension features, then drops any extension features that are no longer needed, then puts back the dependencies. This produces pretty logical error messages, but I don't think it generalizes to any of the other things we might want to do (e.g. version number dependencies) so while we could commit this patch the way it is now I think doing anything more complex will require a different approach. When people want to depend on a feature that was added after initial release of a package, they invariably use versioned dependencies like Requires: foobar = nnn. And it's also pretty common to use such a dependency because you need to have a bug fixed that was fixed in version nnn; so assuming that you only need feature names for, er, features may be a mistake too. Hmm, interesting. So if you look at common practice, this whole idea is wrong and we ought to define a way to compare version numbers instead. I'm not entirely sure that I want to go there yet, but it certainly bears considering as a time-tested alternative design. I think that's definitely worth considering. One idea would be to mandate that you can only use the version-dependencies feature if both versions are of some specific form (e.g. two or three integers separated by periods). If you try to depend on foo = 1.b or if you depend on foo = 1.3 but the installed version is 1.b, it errors out. Frankly, I'm not sure we bet on the right horse in not mandating a version numbering scheme from the beginning. But given that we didn't, we probably don't want to get too forceful about it too quickly. However, we could ease into it by documenting a recommended numbering scheme and making features like version-dependencies work only when that scheme is used. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
On tor, 2012-03-29 at 14:39 -0400, Robert Haas wrote: but it breaks down when you, say, want to wrap your egg into a Debian package. *blink* Huh? Well, you can't represent that mechanism in a Debian (or RPM) package dependency. So the alternatives are make it a Recommends and add a free-form explanation somewhere, or make it a hard Depends, thus overriding the fine-grained dependency setup. More to the point, I think any mechanism we dream up here that is smarter than what dpkg or rpm can represent is going to be useful only in niche situation. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
On Mar 29, 2012, at 11:48 AM, Robert Haas wrote: Frankly, I'm not sure we bet on the right horse in not mandating a version numbering scheme from the beginning. But given that we didn't, we probably don't want to get too forceful about it too quickly. However, we could ease into it by documenting a recommended numbering scheme and making features like version-dependencies work only when that scheme is used. PGXN mandates semantic versions for this reason (currently v1.0.0): http://semver.org/spec/v1.0.0.html Removes all the ambiguity. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Tom Lane t...@sss.pgh.pa.us writes: Peter Eisentraut pete...@gmx.net writes: At the very least, I would suggest that feature names are per-extension. Yeah, I had about come to that conclusion too. A global namespace for them would be a mistake given lack of central coordination. That's how I did it first, but Alvaro opposed to that because it allows for more than one extension to provide for the same feature name. http://archives.postgresql.org/pgsql-hackers/2012-03/msg01425.php Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Tom Lane t...@sss.pgh.pa.us writes: Peter Eisentraut pete...@gmx.net writes: At the very least, I would suggest that feature names are per-extension. Yeah, I had about come to that conclusion too. A global namespace for them would be a mistake given lack of central coordination. That's how I did it first, but Alvaro opposed to that because it allows for more than one extension to provide for the same feature name. http://archives.postgresql.org/pgsql-hackers/2012-03/msg01425.php Right, but the question that has to be considered is how often would that be intentional as opposed to an undesirable name collision. I think Hitoshi was right upthread that it will seldom if ever be the case that somebody is independently reimplementing somebody else's API, so the use-case for intentional substitution seems thin. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ECPG FETCH readahead
2012-03-29 20:34 keltezéssel, Michael Meskes írta: On Thu, Mar 29, 2012 at 01:03:41PM -0400, Noah Misch wrote: Still, we're looking at dedicated ECPG syntax, quite visible even to folks with no interest in Informix. We have eschewed littering our syntax with compatibility aids, and I like it that way. IMO, an option to the ecpg preprocessor is an acceptable level of muddle to provide this aid. A DECLARE CURSOR syntax extension goes too far. +1 from me. Let's not add special ecpg sql syntax if we don't absolutely have to. Michael This is what I waited for, finally another opinion. I will delete this from the grammar. What about the other question about the 0 vs 1 distinction? Without the second patch, 0 drives the cursor with the old ECPGdo() code. All other values drive the cursor via the new ECPGopen() et.al. Should we keep this behaviour? In this case the 0 vs 1 distinction is not needed in add_cursor() as the 0 value doesn't even reach ECPGopen(). But if we consider including the second patch as well, should we keep the NO READAHEAD option in the grammar and in cursor.c? After solving the problem with WHERE CURRENT OF, this and the 0-vs-1 distinction starts to feel pointless. And what about the override via the environment variable? Should it work only upwards? Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On Thu, Mar 29, 2012 at 8:04 AM, Andrew Dunstan and...@dunslane.net wrote: 1. I've been in discussion with some people about adding simple JSON extract functions. We already have some (i.e. xpath()) for XML. 2. You might find htsql http://htsql.org/ interesting. My colleagues and myself have thought about this quite a lot. Even without changing any of FEBE's semantics (so far, I think they're basically fine, with the glaring problem of hidden session state) I think it may be a good time to experiment in supporting SPDY (clearly, 9.3+), which solves some of the problems of HTTP that make it pretty unusable for interactive database workload. For a long time, it looked like SPDY would become the IETF's HTTP 2.0. But the future is still clouded, as just recently Microsoft introduced their own mostly-compatible design. So that's a bit scary. More concretely, here's what I really want. The first two concerns are almost entirely technical, the latter two social, in that they rely on notion of common or standard': * A standard frame extension format. For example, last I checked Postgres-XC, it required snapshot information to be passed, and it'd be nice that instead of having to hack the protocol that they could attach an X-Snapshot-Info header, or whatever. This could also be a nice way to pass out-of-band hint information of all sorts. * Something similar to the HTTP Host header, so that one can route to databases without having to conflate database identity with the actual port being connected to. Yes, theoretically it can be done with weird startup packet gyrations, but that is firmly in the weird category. * HTTP -- and *probably* its hypothetical progeny -- are more common than FEBE packets, and a lot of incidental complexity of writing routers is reduced by the commonality of routing HTTP traffic. * Protocol compression -- but a bit of sand in the gears is *which* compression -- for database workloads, the performance of zlib can be a meaningful bottleneck. Lastly, a case that can not as easily be fixed without some more thinking is leveraging caching semantics of HTTP. think people would really, really like that, if they could get away from having to hand-roll their own cache regeneration in common cases. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On Thu, Mar 29, 2012 at 12:57 PM, Daniel Farina dan...@heroku.com wrote: D'oh, I munged the order. More technical concerns: * Protocol compression -- but a bit of sand in the gears is *which* compression -- for database workloads, the performance of zlib can be a meaningful bottleneck. * Something similar to the HTTP Host header, so that one can route to databases without having to conflate database identity with the actual port being connected to. Yes, theoretically it can be done with weird startup packet gyrations, but that is firmly in the weird category. Socialish (but no less important): * A standard frame extension format. For example, last I checked Postgres-XC, it required snapshot information to be passed, and it'd be nice that instead of having to hack the protocol that they could attach an X-Snapshot-Info header, or whatever. This could also be a nice way to pass out-of-band hint information of all sorts. * HTTP -- and *probably* its hypothetical progeny -- are more common than FEBE packets, and a lot of incidental complexity of writing routers is reduced by the commonality of routing HTTP traffic. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
I wrote: ... PREPARE/EXECUTE work a bit funny though: if you have track = all then you get EXECUTE cycles reported against both the EXECUTE statement and the underlying PREPARE. This is because when PREPARE calls parse_analyze_varparams the post-analyze hook doesn't know that this isn't a top-level statement, so it marks the query with a queryId. I don't see any way around that part without something like what I suggested before. However, this behavior seems to me to be considerably less of a POLA violation than the cases involving two identical-looking entries for self-contained statements, and it might even be thought to be a feature not a bug (since the PREPARE entry will accumulate totals for all uses of the prepared statement). So I'm satisfied with it for now. Actually, there's an easy hack for that too: we can teach the ProcessUtility hook to do nothing (and in particular not increment the nesting level) when the statement is an ExecuteStmt. This will result in the executor time being blamed on the original PREPARE, whether or not you have enabled tracking of nested statements. That seems like a substantial win to me, because right now you get a distinct EXECUTE entry for each textually-different set of parameter values, which seems pretty useless. This change would make use of PREPARE/EXECUTE behave very nearly the same in pg_stat_statement as use of protocol-level prepared statements. About the only downside I can see is that the cycles expended on evaluating the EXECUTE's parameters will not be charged to any pg_stat_statement entry. Since those can be expressions, in principle this might be a non-negligible amount of execution time, but in practice it hardly seems likely that anyone would care about it. Barring objections I'll go fix this, and then this patch can be considered closed except for possible future tweaking of the sticky-entry decay rule. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Thu, Mar 29, 2012 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: ... PREPARE/EXECUTE work a bit funny though: if you have track = all then you get EXECUTE cycles reported against both the EXECUTE statement and the underlying PREPARE. This is because when PREPARE calls parse_analyze_varparams the post-analyze hook doesn't know that this isn't a top-level statement, so it marks the query with a queryId. I don't see any way around that part without something like what I suggested before. However, this behavior seems to me to be considerably less of a POLA violation than the cases involving two identical-looking entries for self-contained statements, and it might even be thought to be a feature not a bug (since the PREPARE entry will accumulate totals for all uses of the prepared statement). So I'm satisfied with it for now. Actually, there's an easy hack for that too: we can teach the ProcessUtility hook to do nothing (and in particular not increment the nesting level) when the statement is an ExecuteStmt. This will result in the executor time being blamed on the original PREPARE, whether or not you have enabled tracking of nested statements. That seems like a substantial win to me, because right now you get a distinct EXECUTE entry for each textually-different set of parameter values, which seems pretty useless. This change would make use of PREPARE/EXECUTE behave very nearly the same in pg_stat_statement as use of protocol-level prepared statements. About the only downside I can see is that the cycles expended on evaluating the EXECUTE's parameters will not be charged to any pg_stat_statement entry. Since those can be expressions, in principle this might be a non-negligible amount of execution time, but in practice it hardly seems likely that anyone would care about it. Barring objections I'll go fix this, and then this patch can be considered closed except for possible future tweaking of the sticky-entry decay rule. After reading your last commit message, I was wondering if something like this might be possible, so +1 from me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] query cache
On Fri, Mar 23, 2012 at 2:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: What I think is more common is the repeated submission of queries that are *nearly* identical, but with either different parameter bindings or different constants. It would be nice to have some kind of cache that would allow us to avoid the overhead of parsing and planning nearly identical statements over and over again, but the trick is that you have to fingerprint the query to notice that's happening in the first place, and the fingerprinting has to cost less than what the cache saves you. I don't know whether that's possible, but I suspect it's far from easy. The traditional solution to this is to make the application do it, ie, parameterized prepared statements. Since that has direct benefits to the application as well, in that it can use out-of-line values and thereby avoid quoting and SQL-injection risks, it's not apparent that it's a good idea to expend lots of sweat to reverse-engineer parameterization from a collection of unparameterized queries. But there are several disadvantages to unparameterized queries, too. One, it requires more messages at the protocol level, which is not free. Two, whatever abstraction layer you're using to submit queries to the database has to support it, and not all of them do. And three, you get worse query plans if lack of knowledge about the specific query parameters proves to be essential. I know you've done some work on this last point for 9.2, but I'm fuzzy on the details and on how much benefit we'll get out of it in real-world cases. Interestingly, Peter Geoghegan's blog post on the pg_stat_statements patch you just committed[1] claims that the overhead of fingerprinting queries was only 1-2.5%, which is less than I would have thought, so if we ever get to the point where we're fairly sure we've got problem three licked, it might make sense to revisit this due to problems one and two. It's also probably worth keeping in mind the next time we bump the protocol version: it would be nice to have a way of doing prepare-bind-execute in a single protocol message, which I believe to be not possible at present. [1] http://pgeoghegan.blogspot.com/2012/03/much-improved-statement-statistics.html -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
Robert Haas robertmh...@gmail.com writes: Well, preceding and before are synonyms, so I don't see any advantage in that change. But I really did mean AT permissions_checking time, not before or after it. That is, we'd have a hook where instead of doing something like this: aclresult = pg_class_aclcheck(reloid, GetUserId(), ACL_SELECT); ...we'd call a superuser-supplied trigger function and let it make the Wow. I don't think I like that at all. It's indeed powerful, but how do you go explaining and fixing unanticipated behavior with such things in production? It looks too much like an invitation to break a very careful design where each facility has to rove itself to get in. So yes, that's not at all what I was envisioning, I still think that most event specs for event triggers are going to have to happen in between our different internal implementation of given facility. Maybe we should even use BEFORE in cases I had in mind and AT for cases like the one you're picturing? CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...]) EXECUTE PROCEDURE function_name(args); I would amend that syntax to allow for a WHEN expr much like in the DML trigger case, where the expression can play with the variables exposed at the time of calling the trigger. create event trigger prohibit_some_ddl preceding timing spec when tag in ('CREATE TABLE', 'ALTER TABLE') execute procedure throw_an_error(); We could do it that way, but the tag in ('CREATE TABLE', 'ALTER TABLE') syntax will be tricky to evaluate. I'd really prefer NOT to need to start up and shut down the executor to figure out whether the trigger has to fire. If we are going to do it that way then we need to carefully define what variables are available and what values they get. I think that most people's event-filtering needs will be simple enough to be served by a more declarative syntax. We could then maybe restrict that idea so that the syntax is more like when trigger variable in (literal[, ...]) So that we just have to store the array of strings and support only one operation there. We might want to go as far as special casing the object id to store an oid vector there rather than a text array, but we could also decide not to support per-oid command triggers in the first release and remove that from the list of accepted trigger variables. I also don't like the way you squeeze in the drop cascade support here by re qualifying it as a timing spec, which it's clearly not. On subcommand_start, what is the tag set to? the main command or the subcommand? if the former, where do you find the current subcommand tag? I don't think subcommand_start should be a timing spec either. That's up for grabs, but I was thinking the subcommand. For example, consider: alter table t alter column a add column b int, disable trigger all; I would imagine this having a firing sequence something like this: ddl_command_start:alter_table ddl_command_permissions_check:alter_table ddl_command_name_lookup:alter_table alter_table_subcommand_prep:add_column alter_table_subcommand_prep:disable_trigger_all alter_table_subcommand_start:add_column sql_create:column alter_table_subcommand_end:add_column alter_table_subcommand_start:disable_trigger_all alter_table_subcommand_end:disable_trigger_all ddl_command_end:alter_table Lots of room for argument there, of course. Yeah well, in my mind the alter table actions are not subcommands yet because they don't go back to standard_ProcessUtility(). The commands in CREATE SCHEMA do that, same for create table foo(id serial primary key) and its sequence and index. We could maybe add another variable called context in the command trigger procedure API that would generally be NULL and would be set to the main command tag if we're running a subcommand. We could still support alter table commands as sub commands as far as the event triggers are concerned, and document that you don't get a parse tree there even when implementing the trigger in C. Yeah, I would not try to pass the parse tree to every event trigger, just the ones where it's well-defined. For drops, I would just say, hey, this is what we dropped. The user can log it or ereport, but that's about it. Still, that's clearly enough to do a lot of really interesting stuff, replication being the most obvious example. Yes the use case is important enough to want to support it. Now, to square it into the design. I think it should fire as a “normal” event trigger, with the context set to e.g. DROP TYPE, the tag set to this object classid (e.g. DROP FUNCTION). It's easy enough to implement given that CreateCommandTag() already switch (((DropStmt *) parsetree)-removeType), we just need to move that code in another function and reuse it at the right places. It's only missing DROP COLUMN apparently, I would have to add that. So you would know that the trigger is firing for a DROP FUNCTION that
Re: [HACKERS] query cache
Robert Haas robertmh...@gmail.com writes: Interestingly, Peter Geoghegan's blog post on the pg_stat_statements patch you just committed[1] claims that the overhead of fingerprinting queries was only 1-2.5%, which is less than I would have thought, so if we ever get to the point where we're fairly sure we've got problem three licked, it might make sense to revisit this due to problems one and two. Maybe. It's also probably worth keeping in mind the next time we bump the protocol version: it would be nice to have a way of doing prepare-bind-execute in a single protocol message, which I believe to be not possible at present. Huh? That's the standard way of doing it, actually. You send Prepare/Bind/Execute/Sync in one packet, and wait for results. This requires that you understand the query well enough to perform Bind without seeing a Describe result, but that seems essential to any one-round-trip case anyway. It's not the protocol design that is holding back anybody who wants to do that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
Robert Haas robertmh...@gmail.com writes: Apropos of nothing and since I haven't found a particularly good time to say this in amidst all the technical discussion, I appreciate very much all the work you've been putting into this. Hey, thanks, I very much appreciate your support here! (1) is not hard to fix as soon as we set a policy, which the most pressing thing we need to do in my mind, whatever we manage to commit in 9.2. I agree that setting a policy is enormously important, not so sure about how easy it is to fix, but we shall see. I think your proposal idea on the table is fixing it (the new event timing specs), it's all about fine tuning it now. My primary and overriding goal here is to make sure that we don't make any design decisions, in the syntax or otherwise, that foreclose the ability to add policies that we've not yet dreamed of to future Yeah we're not hard coding the list of possible event timings in the syntax. In terms of policies, there are two that seems to me to be very clear: at the very beginning of ProcessUtility, and at the very end of it, *excluding* recursive calls to ProcessUtility that are intended to handle subcommands. I think we could call the first start or command_start or ddl_command_start or post-parse or pre-locking or something along those lines, and the second end or command_end or ddl_command_end. I think it might be worth including ddl in there because the scope of the current patch really is just DDL (as opposed to say DCL) and having a separate set of firing points for DCL does not seem dumb. If we ever try to do anything with transaction control as you suggested upthread that's likely to also require special handling. Calling it, specifically, ddl_command_start makes the scope very clear. I'm not sure about that really. First, the only reason why DCL command are not supported in the current patch is because roles are global objects and we don't want role related behavior to depend on which database you're currently connected to. Then, I guess any utility command implementation will share the same principle of having a series of step and allowing a user defined function to get called in between some of them. So adding support for commands that won't fit in the DDL timing specs we're trying to design now amounts to adding timing specs for those commands. I'm not sold that the timing specs should contain ddl or dcl or other things, really, I find that to be ugly, but I won't fight over that. (2) can be addressed with a simple blacklist that would be set just before calling user defined code, and cleaned when done running it. When in place the blacklist lookup is easy to implement in a central place (utility.c or cmdtrigger.c) and ereport() when the current command is in the blacklist. e.g. alter table would blacklist alter table and drop table commands on the current object id Here we're talking about a firing point that we might call ddl_post_name_lookup or something along those lines. I would prefer to handle this by making the following rules - here's I'm assuming that we're talking about the case where the object in question is a relation: 1. The trigger fires immediately after RangeVarGetRelidExtended and before any other checks are performed, and especially before any CheckTableNotInUse(). This last is important, because what matters is whether the table's not in use AFTER all possible user-supplied code is executed. If the trigger opens or closes cursors, for example, what matters is whether there are any cursors left open *after* the triggers complete, not whether there were any open on entering the trigger. The same is true of any other integrity check: if there's a chance that the trigger could change something that affects whether the integrity constraint gets fired, then we'd better be darn sure to fire the trigger before we check the integrity constraint. Ack. 2. If we fire any triggers at this point, we CommandCounterIncrement() and then re-do RangeVarGetRelidExtended() with the same arguments we passed before. If it doesn't return the same OID we got the first time, we abort with some kind of serialization error. We need to be a little careful about the phrasing of the error message, because it's possible for this to change during command execution even without command triggers if somebody creates a table earlier in the search path with the same unqualified name that was passed to the command, but I think it's OK for the existence of a command-trigger to cause a spurious abort in that very narrow case, as long as the error message includes some appropriate weasel language. Alternatively, we could try to avoid that corner case by rechecking only that a tuple with the right OID is still present and still has the correct relkind, but that seems like it might be a little less bullet-proof for no real functional gain. I can buy having a
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp writes: I'm sorry to have coded a silly bug. The previous patch has a bug in realloc size calculation. And separation of the 'connname patch' was incomplete in regtest. It is fixed in this patch. I've applied a modified form of the conname update patch. It seemed to me that the fault is really in the DBLINK_GET_CONN and DBLINK_GET_NAMED_CONN macros, which ought to be responsible for setting the surrounding function's conname variable along with conn, rconn, etc. There was actually a second error of the same type visible in the dblink regression test, which is also fixed by this more general method. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
On Thu, Mar 29, 2012 at 4:21 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: Well, preceding and before are synonyms, so I don't see any advantage in that change. But I really did mean AT permissions_checking time, not before or after it. That is, we'd have a hook where instead of doing something like this: aclresult = pg_class_aclcheck(reloid, GetUserId(), ACL_SELECT); ...we'd call a superuser-supplied trigger function and let it make the Wow. I don't think I like that at all. It's indeed powerful, but how do you go explaining and fixing unanticipated behavior with such things in production? It looks too much like an invitation to break a very careful design where each facility has to rove itself to get in. I'm thinking of things like extension whitelisting. When some unprivileged user says CREATE EXTENSION harmless, and harmless is marked as superuser-only, we might like to have a hook that gets called *at permissions-checking time* and gets to say, oh, well, that extension is on the white-list, so we're going to allow it. I think you can come up with similar cases for other commands, where in general the operation is restricted to superusers or database owners or table owners but in specific cases you want to allow others to do it. CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...]) EXECUTE PROCEDURE function_name(args); I would amend that syntax to allow for a WHEN expr much like in the DML trigger case, where the expression can play with the variables exposed at the time of calling the trigger. create event trigger prohibit_some_ddl preceding timing spec when tag in ('CREATE TABLE', 'ALTER TABLE') execute procedure throw_an_error(); We could do it that way, but the tag in ('CREATE TABLE', 'ALTER TABLE') syntax will be tricky to evaluate. I'd really prefer NOT to need to start up and shut down the executor to figure out whether the trigger has to fire. If we are going to do it that way then we need to carefully define what variables are available and what values they get. I think that most people's event-filtering needs will be simple enough to be served by a more declarative syntax. We could then maybe restrict that idea so that the syntax is more like when trigger variable in (literal[, ...]) So that we just have to store the array of strings and support only one operation there. We might want to go as far as special casing the object id to store an oid vector there rather than a text array, but we could also decide not to support per-oid command triggers in the first release and remove that from the list of accepted trigger variables. I guess that would make sense if you think there would ever be more than one choice for trigger variable. I'm not immediately seeing a use case for that, though - I was explicitly viewing the syntax foo (bar, baz) to mean foo when the-only-trigger-variable-that-makes-sense-given-that-we-are-talking-about-a-trigger-on-foo in (bar, baz). More generally, my thought on the structure of this is that you're going to have certain toplevel events, many of which will happen at only a single place in the code, like an object got dropped or a DDL command started or a DDL command ended. So we give those names, like sql_drop, ddl_command_start, and ddl_command_end. Inside your trigger procedure, the set of magic variables that is available will depend on which toplevel event you set the trigger on, but hopefully all firings of that toplevel event can provide the same magic variables. For example, at ddl_command_start time, you're just gonna get the command tag, but at ddl_command_end time you will get the command tag plus maybe some other stuff. Now, we COULD stop there. I mean, we could document that you can create a trigger on ddl_command_start and every DDL command will fire that trigger, and if the trigger doesn't care about some DDL operations, then it can look at the command tag and return without doing anything for the operations it doesn't care about. The only real disadvantage of doing it that way is speed, and maybe a bit of code complexity within the trigger. So my further thought was that we'd allow you to specify an optional filter list to restrict which events would fire the trigger, and the exact meaning of that filter list would vary depending on the toplevel event you've chosen - i.e. for ddl_command_start, the filter would be a list of commands, but for sql_drop it would be a list of object types. We could make that more complicated if we think that an individual toplevel event will need more than one kind of filtering. For example, if you wanted to filter sql_drop events based on the object type AND/OR the schema name, then the syntax I proposed would obviously not be adequate. I'm just not convinced we need that, especially because you'd then need to set up a dependency between the command trigger and the schema,
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
Marko Kreen mark...@gmail.com writes: My conclusion is that row-processor API is low-level expert API and quite easy to misuse. It would be preferable to have something more robust as end-user API, the PQgetRow() is my suggestion for that. Thus I see 3 choices: 1) Push row-processor as main API anyway and describe all dangerous scenarios in documentation. 2) Have both PQgetRow() and row-processor available in libpq-fe.h, PQgetRow() as preferred API and row-processor for expert usage, with proper documentation what works and what does not. 3) Have PQgetRow() in libpq-fe.h, move row-processor to libpq-int.h. I still am failing to see the use-case for PQgetRow. ISTM the entire point of a special row processor is to reduce the per-row processing overhead, but PQgetRow greatly increases that overhead. And it doesn't reduce complexity much either IMO: you still have all the primary risk factors arising from processing rows in advance of being sure that the whole query completed successfully. Plus it conflates no more data with there was an error receiving the data or there was an error on the server side. PQrecvRow alleviates the per-row-overhead aspect of that but doesn't really do a thing from the complexity standpoint; it doesn't look to me to be noticeably easier to use than a row processor callback. I think PQgetRow and PQrecvRow just add more API calls without making any fundamental improvements, and so we could do without them. There's more than one way to do it is not necessarily a virtue. Second conclusion is that current dblink row-processor usage is broken when user uses multiple SELECTs in SQL as dblink uses plain PQexec(). Yeah. Perhaps we should tweak the row-processor callback API so that it gets an explicit notification that this is a new resultset. Duplicating PQexec's behavior would then involve having the dblink row processor throw away any existing tuplestore and start over when it gets such a call. There's multiple ways to express that but the most convenient thing from libpq's viewpoint, I think, is to have a callback that occurs immediately after collecting a RowDescription message, before any rows have arrived. So maybe we could express that as a callback with valid res but columns set to NULL? A different approach would be to add a row counter to the arguments provided to the row processor; then you'd know a new resultset had started if you saw rowcounter == 0. This might have another advantage of not requiring the row processor to count the rows for itself, which I think many row processors would otherwise have to do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] query cache
On Thu, Mar 29, 2012 at 4:57 PM, Tom Lane t...@sss.pgh.pa.us wrote: It's also probably worth keeping in mind the next time we bump the protocol version: it would be nice to have a way of doing prepare-bind-execute in a single protocol message, which I believe to be not possible at present. Huh? That's the standard way of doing it, actually. You send Prepare/Bind/Execute/Sync in one packet, and wait for results. That precludes some optimizations, like doing the whole thing under the same snapshot, since you don't know for sure when the Execute will be sent. And as a practical matter, it's slower. So there's some room for improvement there, any way you slice it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On Thu, Mar 29, 2012 at 11:04 PM, Andrew Dunstan and...@dunslane.netwrote: On 03/29/2012 10:37 AM, Dobes Vandermeer wrote: Hi guys, Something from Josh's recent blog post about summer of code clicked with me - the HTTP / SQL concept. 1. I've been in discussion with some people about adding simple JSON extract functions. We already have some (i.e. xpath()) for XML. 2. You might find htsql http://htsql.org/ interesting. As a reference, or should we just bundle / integrate that with PostgreSQL somehow?
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On Thu, Mar 29, 2012 at 6:25 PM, Dobes Vandermeer dob...@gmail.com wrote: 2. You might find htsql http://htsql.org/ interesting. As a reference, or should we just bundle / integrate that with PostgreSQL somehow? It's a totally different language layer without wide-spread popularity and, as of last year, still figuring out how to model updates and transactions, with a Python dependency. It is good work, but to build it in and to commit maintaining it seems very premature, to my eyes. Also, it would tie someone -- possibly the htsql maintainers -- to a upgrade and maintenance policy not unlike Postgres itself, and that is a very expensive proposition while they are still figuring things out. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On Fri, Mar 30, 2012 at 3:59 AM, Daniel Farina dan...@heroku.com wrote: On Thu, Mar 29, 2012 at 12:57 PM, Daniel Farina dan...@heroku.com wrote: More technical concerns: * Protocol compression -- but a bit of sand in the gears is *which* compression -- for database workloads, the performance of zlib can be a meaningful bottleneck. I think if performance is the issue, people should use the native protocol. This HTTP thing should be more of a RAD / prototyping thing, I think. So people can be in their comfort zone when talking to the server. * Something similar to the HTTP Host header, so that one can route to databases without having to conflate database identity with the actual port being connected to. Yes, theoretically it can be done with weird startup packet gyrations, but that is firmly in the weird category. Isn't the URL good enough (/databases/dbname) or are you talking about having some some of virtual host setup where you have multiple sets of databases available on the same port? Socialish (but no less important): * A standard frame extension format. For example, last I checked Postgres-XC, it required snapshot information to be passed, and it'd be nice that instead of having to hack the protocol that they could attach an X-Snapshot-Info header, or whatever. This could also be a nice way to pass out-of-band hint information of all sorts. I am sorry to admit I don't understand the terms frame extension format or Postgres-XC in this paragraph ... help? * HTTP -- and *probably* its hypothetical progeny -- are more common than FEBE packets, and a lot of incidental complexity of writing routers is reduced by the commonality of routing HTTP traffic. This is an interesting comment. I'm not sure how to test whether an HTTP based protocol will be better supported than a proprietary one, but I think we have enough other reasons that we can move forward. Well we have the reason that there's some kind of love affair with HTTP based protocols going on out there ... might as well ride the wave while it's still rising (I hope). As for SPDY I can see how it may be helpful but as it is basically a different way to send HTTP requests (from what I understand) the migration to SPDY is mainly a matter of adding support for it to whatever HTTP library is used. Anyone have a thought on whether, for the HTTP server itself, it should be integrated right into the PostgreSQL server itself? Or would it be better to have a separate process that proxies requests to PostgreSQL using the existing protocol? Is there an API that can be used in both cases semi-transparently (i.e. the functions have the same name when linked right in, or when calling via a socket)? Cheers, Dobes
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On Fri, Mar 30, 2012 at 3:57 AM, Daniel Farina dan...@heroku.com wrote: On Thu, Mar 29, 2012 at 8:04 AM, Andrew Dunstan and...@dunslane.net wrote: Lastly, a case that can not as easily be fixed without some more thinking is leveraging caching semantics of HTTP. think people would really, really like that, if they could get away from having to hand-roll their own cache regeneration in common cases. I think this could be an interesting possibility. I wonder if the cost of a round-trip makes the cost of sending the actual data (vs a 304 response) irrelevant - as long as PostgreSQL is caching effectively internally it's possible it can send back the actual content as fast as it can calculate the ETag for it, so doing an extra query to check for changes could possibly slow things down, or at least eliminate the benefit. Probably worth trying, though. Supporting this automagically would require some kind of generic algorithm for calculating the Last-Modifed time or ETag for a given query. As a default we may be able to just fall back on some internal global value that is guaranteed to change if the database has changed (I think the WAL files have some kind of serial number system we might use) so at the very least you could send back a 304 Not Modified if literally nothing in the database has changed. Narrowing that down to specific table timestamps might be possible, too, for simple queries. It depends what data is already available, I wouldn't want to add any extra book keeping for it. A more pragmatic may be to have the HTTP request include SQL code to generate an ETag or Last-Modified value to test with; the app could run that first and it would be used for caching. Something like calcLastModified=max(modified_date) on the query string.
Re: [HACKERS] Odd out of memory problem.
On tis, 2012-03-27 at 00:53 +0100, Greg Stark wrote: Hm. So my original plan was dependent on adding the state-merge function we've talked about in the past. Not all aggregate functions necessarily can support such a function but I think all or nearly all the builtin aggregates can. Certainly min,max, count, sum, avg, stddev, array_agg can which are most of what people do. That would be a function which can take two state variables and produce a new state variable. This information could also be useful to have in PL/Proxy (or similar FDWs) to be able to integrate aggregate computation into the language. Currently, you always have to do the state merging yourself. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On Thu, Mar 29, 2012 at 6:37 PM, Dobes Vandermeer dob...@gmail.com wrote: On Fri, Mar 30, 2012 at 3:59 AM, Daniel Farina dan...@heroku.com wrote: On Thu, Mar 29, 2012 at 12:57 PM, Daniel Farina dan...@heroku.com wrote: More technical concerns: * Protocol compression -- but a bit of sand in the gears is *which* compression -- for database workloads, the performance of zlib can be a meaningful bottleneck. I think if performance is the issue, people should use the native protocol. This HTTP thing should be more of a RAD / prototyping thing, I think. So people can be in their comfort zone when talking to the server. No. I do not think so. I think a reasonable solution (part of what MS is actually proposing to the IETF) is to make compression optional, or have clients support an LZ77 family format like Snappy. I get the impression that SPDY is waffling a little on this fact, it mandates compression, and definitely zlib, but is less heavy handed about pushing, say Snappy. Although I can understand why a Google-originated technology probably doesn't want to push another Google-originated implementation so hard, it would have been convenient for me for Snappy to have become a more common format. Isn't the URL good enough (/databases/dbname) or are you talking about having some some of virtual host setup where you have multiple sets of databases available on the same port? Virtual hosts. Same port. * A standard frame extension format. For example, last I checked Postgres-XC, it required snapshot information to be passed, and it'd be nice that instead of having to hack the protocol that they could attach an X-Snapshot-Info header, or whatever. This could also be a nice way to pass out-of-band hint information of all sorts. I am sorry to admit I don't understand the terms frame extension format or Postgres-XC in this paragraph ... help? I'm being vague. Postgres-XC is a project to provide a shared-nothing sync-rep cluster for Postgres. My last understanding of it is that it needed to pass snapshot information between nodes, and FEBE was expanded to make room for this, breaking compatibility, as well as probably being at least a small chore. It'd be nice if it wasn't necessary to do that and they had a much easier path to multiplex additional information into the connection. For my own purposes, I'm intensely interest in lacing the connection with: * EXPLAIN ANALYZE returns when the query has already run, getting both the actual timings *and* the results to the client. * Partition IDs, whereby you can find the right database and (potentially!) even influence how the queries are scoped to a tenant * Read-only vs. Write workload: As is well established, it's hard to know a-priori if a query is going to do a write. Fine. Let the client tag it, signal an error if something is wrong. Yes, theoretically all these features -- or just a general multiplexing scheme -- can be added to FEBE, but if we're even going to consider such an upheaval, maybe we can get *lot* more bang for our buck by trying to avoid being unnecessarily different from the most common application-level protocol in existence, causing extraneous work for router and proxy authors. Notably, a vanilla Postgres database knows nothing about these extension headers. * HTTP -- and *probably* its hypothetical progeny -- are more common than FEBE packets, and a lot of incidental complexity of writing routers is reduced by the commonality of routing HTTP traffic. This is an interesting comment. I'm not sure how to test whether an HTTP based protocol will be better supported than a proprietary one, but I think we have enough other reasons that we can move forward. Well we have the reason that there's some kind of love affair with HTTP based protocols going on out there ... might as well ride the wave while it's still rising (I hope). At its core, what may be growing unnecessary is FEBE's own mechanism for delimiting messages. All the other protocol actions -- such as shipping Binds, Executes, Describes, et al, are not going to be obsoleted or even changed by laying web-originated technologies under FEBE. Consider the wealth of projects, products, and services that filter HTTP vs FEBE, many quite old, now. In my mind, wave might be better rendered tsunami. The very real problem, as I see it, is that classic, stateless HTTP would be just too slow to be practical. As for SPDY I can see how it may be helpful but as it is basically a different way to send HTTP requests (from what I understand) the migration to SPDY is mainly a matter of adding support for it to whatever HTTP library is used. I think SPDY or like-protocols (there's only one other I can think of, the very recently introduced and hilariously branded SM from Microsoft. It's memorable, at least) are the only things that appear to give a crisp treatment to interactive, stateful workloads involving back-and-forth between client and
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
I've applied a modified form of the conname update patch. It seemed to me that the fault is really in the DBLINK_GET_CONN and DBLINK_GET_NAMED_CONN macros, which ought to be responsible for setting the surrounding function's conname variable along with conn, rconn, etc. There was actually a second error of the same type visible in the dblink regression test, which is also fixed by this more general method. Come to think of it, the patch is mere a verifying touch for the bug mingled with the other part of the dblink patch to all appearances. I totally agree with you. It should be dropped for this time and done another time. I'am sorry for bothering you by such a damn thing. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On Fri, Mar 30, 2012 at 10:55 AM, Daniel Farina dan...@heroku.com wrote: On Thu, Mar 29, 2012 at 6:37 PM, Dobes Vandermeer dob...@gmail.com wrote: On Fri, Mar 30, 2012 at 3:59 AM, Daniel Farina dan...@heroku.com wrote: On Thu, Mar 29, 2012 at 12:57 PM, Daniel Farina dan...@heroku.com wrote: More technical concerns: * Protocol compression -- but a bit of sand in the gears is *which* compression -- for database workloads, the performance of zlib can be a meaningful bottleneck. I think if performance is the issue, people should use the native protocol. No. I do not think so. I think a reasonable solution (part of what MS is actually proposing to the IETF) is to make compression optional, or have clients support an LZ77 family format like Snappy. I get the impression that SPDY is waffling a little on this fact, it mandates compression, and definitely zlib, but is less heavy handed about pushing, say Snappy. Although I can understand why a Google-originated technology probably doesn't want to push another Google-originated implementation so hard, it would have been convenient for me for Snappy to have become a more common format. Isn't the URL good enough (/databases/dbname) or are you talking about having some some of virtual host setup where you have multiple sets of databases available on the same port? Virtual hosts. Same port. In that case, the frontend would not be tied to a specific PostgreSQL server, then? I think initially this might complicate things a bit, and you could solve it by putting an HTTP proxy in front to do the virtual hosts for you. * A standard frame extension format. For example, last I checked Postgres-XC, it required snapshot information to be passed, and it'd be nice that instead of having to hack the protocol that they could attach an X-Snapshot-Info header, or whatever. This could also be a nice way to pass out-of-band hint information of all sorts. I am sorry to admit I don't understand the terms frame extension format or Postgres-XC in this paragraph ... help? It'd be nice if it wasn't necessary to do that and they had a much easier path to multiplex additional information into the connection. Ah, I get it - you want a way to add some extra information to the protocol in a backwards compatible way. HTTP (and SPDY) provides a standard way to do that. Makes sense. For my own purposes, I'm intensely interest in lacing the connection with: * EXPLAIN ANALYZE * Partition IDs * Read-only vs. Write workload I'll make a note of these and hash out the details a bit more once there's something working to add them to. As for SPDY I can see how it may be helpful but as it is basically a different way to send HTTP requests I think SPDY or like-protocols [...] give a crisp treatment to interactive, stateful workloads involving back-and-forth between client and server with multiplexing, fixing some problems with the hacks in HTTP-land from before. It sounds like at some level you're really talking about replacing the built-in protocol with SPDY because SPDY is possibly a better baseline than updating the existing protocol. That's an interesting idea, I think this project could evolve in that direction if there's demand for it.