Re: [HACKERS] Syntax for partitioning
2009/10/29 Itagaki Takahiro : > I'd like to improve partitioning feature in 8.5. > Kedar-san's previous work is wonderful, but I cannot see any updated patch. > http://archives.postgresql.org/message-id/bd8134a40906080702s96c90a9q3bbb581b9bd0d...@mail.gmail.com > > So, I'll take over the work if there are no ones to do it. > I'm thinking to add syntax support first. Table partitioning was > proposed many times, but it is still not applied into core. > The reason is it is too difficult to make perfect partitioning > feature at once. I think syntax support is a good start. > > First, I will add syntax for CREATE TABLE, ALTER TABLE ADD/DROP PARTITION. > The syntax is borrowed from from Oracle and MySQL. Their characteristics > are using "LESS THAN" in range partitioning. The keyword "PARTITION" is > added to the full-reserved keyword list to support ADD/DROP PARTITION. > > Those syntax is merely a syntax sugar for INHERITS with CHECK. Declarations > are translated into CHECK constraints. I have a plan to adjust pg_dump to > dump definitions of partitioning in the correct format, but the actual > implementation will be still based on constraint exclusion. In addition, > hash partitioning is not implemented; syntax is parsed but "not implemented" > error are raised for now. > > Here is syntax I propose: > > ALTER TABLE table_name ADD PARTITION name ...; > ALTER TABLE table_name DROP PARTITION [IF EXISTS] name [CASCADE | RESTRICT]; > > Range partitioning: > CREATE TABLE table_name ( columns ) > PARTITION BY RANGE ( a_expr ) > ( > PARTITION name VALUES LESS THAN [(] const [)], > PARTITION name VALUES LESS THAN [(] MAXVALUE [)] -- overflow partition > ); > > List partitioning: > CREATE TABLE table_name ( columns ) > PARTITION BY LIST ( a_expr ) > ( > PARTITION name VALUES [IN] ( const [, ...] ), > PARTITION name VALUES [IN] [(] DEFAULT [)] -- overflow partition > ); > > Hash partitioning: > CREATE TABLE table_name ( columns ) > PARTITION BY HASH ( a_expr ) > PARTITIONS num_partitions; > > CREATE TABLE table_name ( columns ) > PARTITION BY HASH ( a_expr ) > ( > PARTITION name, > ... > ); > > Note: > * Each partition can have optional WITH (...) and TABLESPACE clauses. > * '(' and ')' are optional to support both Oracle and MySQL syntax. > > > Comments welcome. +1 Pavel > > Regards, > --- > ITAGAKI Takahiro > NTT Open Source Software Center > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Syntax for partitioning
I'd like to improve partitioning feature in 8.5. Kedar-san's previous work is wonderful, but I cannot see any updated patch. http://archives.postgresql.org/message-id/bd8134a40906080702s96c90a9q3bbb581b9bd0d...@mail.gmail.com So, I'll take over the work if there are no ones to do it. I'm thinking to add syntax support first. Table partitioning was proposed many times, but it is still not applied into core. The reason is it is too difficult to make perfect partitioning feature at once. I think syntax support is a good start. First, I will add syntax for CREATE TABLE, ALTER TABLE ADD/DROP PARTITION. The syntax is borrowed from from Oracle and MySQL. Their characteristics are using "LESS THAN" in range partitioning. The keyword "PARTITION" is added to the full-reserved keyword list to support ADD/DROP PARTITION. Those syntax is merely a syntax sugar for INHERITS with CHECK. Declarations are translated into CHECK constraints. I have a plan to adjust pg_dump to dump definitions of partitioning in the correct format, but the actual implementation will be still based on constraint exclusion. In addition, hash partitioning is not implemented; syntax is parsed but "not implemented" error are raised for now. Here is syntax I propose: ALTER TABLE table_name ADD PARTITION name ...; ALTER TABLE table_name DROP PARTITION [IF EXISTS] name [CASCADE | RESTRICT]; Range partitioning: CREATE TABLE table_name ( columns ) PARTITION BY RANGE ( a_expr ) ( PARTITION name VALUES LESS THAN [(] const [)], PARTITION name VALUES LESS THAN [(] MAXVALUE [)] -- overflow partition ); List partitioning: CREATE TABLE table_name ( columns ) PARTITION BY LIST ( a_expr ) ( PARTITION name VALUES [IN] ( const [, ...] ), PARTITION name VALUES [IN] [(] DEFAULT [)] -- overflow partition ); Hash partitioning: CREATE TABLE table_name ( columns ) PARTITION BY HASH ( a_expr ) PARTITIONS num_partitions; CREATE TABLE table_name ( columns ) PARTITION BY HASH ( a_expr ) ( PARTITION name, ... ); Note: * Each partition can have optional WITH (...) and TABLESPACE clauses. * '(' and ')' are optional to support both Oracle and MySQL syntax. Comments welcome. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: push AFTER-trigger execution into ModifyTable node
On Wed, Oct 28, 2009 at 8:45 PM, Tom Lane wrote: > Marko Tiikkaja writes: >> Like we've discussed before, WITH (.. RETURNING ..) is probably most >> useful for moving rows from one table to another. When you're moving a >> lot of rows around, there's some point where I believe this execution >> strategy will be a lot slower than the traditional approach due to >> storing the RETURNING results on disk. I've been thinking that in some >> cases we could inline the CTE for this to actually be a quite >> significant performance benefit, so I'm not too fancy about the approach >> you're suggesting. > > Well, this is what we need to nail down *now*. Are we going to say that > use of WITH(RETURNING) means you forfeit all guarantees about order of > trigger firing? Short of that, I don't believe that it is sane to think > about pipelining such things. And if we do do that, it sounds like a > security hole to me, because the owner of the trigger isn't the one who > agreed to forfeit predictability. I don't see why either behavior would be a security hole; we get to define how the system behaves, and users have to write their triggers to cope with that behavior. We don't want to throw random roadbocks in the way of sanity, but users are not entitled to assume that no future major release of PG will have semantics that are in any way different from whichever release they're now running, especially for features that don't even exist in the current release. If you have a specific concern here, maybe you could provide an example. To be honest, I'm not entirely comfortable with either behavior. Pipelining a delete out of one table into an insert into another table seems VERY useful to me, and I'd like us to have a way to do that. On the other hand, in more complex cases, the fact that the effects of a statement are normally not visible to that statement could lead to some fairly confusing behavior, especially when triggers are involved. So I don't really know what the right thing is. What I really want is to provide both behaviors, but I'm not sure there's any sensible way to do that, and even if there were it's not clear to me that users will know which one they want. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: push AFTER-trigger execution into ModifyTable node
Marko Tiikkaja writes: > Like we've discussed before, WITH (.. RETURNING ..) is probably most > useful for moving rows from one table to another. When you're moving a > lot of rows around, there's some point where I believe this execution > strategy will be a lot slower than the traditional approach due to > storing the RETURNING results on disk. I've been thinking that in some > cases we could inline the CTE for this to actually be a quite > significant performance benefit, so I'm not too fancy about the approach > you're suggesting. Well, this is what we need to nail down *now*. Are we going to say that use of WITH(RETURNING) means you forfeit all guarantees about order of trigger firing? Short of that, I don't believe that it is sane to think about pipelining such things. And if we do do that, it sounds like a security hole to me, because the owner of the trigger isn't the one who agreed to forfeit predictability. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: push AFTER-trigger execution into ModifyTable node
Tom Lane wrote: So, before I go off and do that work: anybody have an objection to this line of development? The main implication of changing to this approach is that we'll be nailing down the assumption that each WITH (command RETURNING) clause acts very much like a separate statement for trigger purposes: it will fire BEFORE STATEMENT triggers at start, and AFTER STATEMENT triggers at end, and actually execute all non-deferred AFTER triggers, before we move on to executing the next WITH clause or the main query. Like we've discussed before, WITH (.. RETURNING ..) is probably most useful for moving rows from one table to another. When you're moving a lot of rows around, there's some point where I believe this execution strategy will be a lot slower than the traditional approach due to storing the RETURNING results on disk. I've been thinking that in some cases we could inline the CTE for this to actually be a quite significant performance benefit, so I'm not too fancy about the approach you're suggesting. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WIP: push AFTER-trigger execution into ModifyTable node
In http://archives.postgresql.org/message-id/26545.1255140...@sss.pgh.pa.us I suggested that we should push the actual execution (not just queuing) of non-deferred AFTER triggers into the new ModifyTable plan node. The attached patch does that, and seems like a nice improvement since it removes knowledge of trigger handling from a number of other places. However the original objective was to allow EXPLAIN to associate trigger runtimes with ModifyTable nodes, and I realized that this patch doesn't accomplish that --- the trigger stats are still accumulated in the executor-wide EState, not in the ModifyTable node. Right at the moment we could cheat and have EXPLAIN print the trigger stats under ModifyTable anyway, because there can be only one ModifyTable in any plan tree. But that will fall down as soon as we try to let INSERT RETURNING and friends execute within WITH clauses. After poking around a bit I think it should be possible to keep the trigger instrumentation data in the ModifyTable node instead of in EState, and thereby allow EXPLAIN to know which node to blame the trigger time on. This will require passing an extra parameter down from nodeModifyTable into the trigger code, but none of those call paths are very long. Still, it'll be a significantly more invasive patch by the time it's done than what you see here. So, before I go off and do that work: anybody have an objection to this line of development? The main implication of changing to this approach is that we'll be nailing down the assumption that each WITH (command RETURNING) clause acts very much like a separate statement for trigger purposes: it will fire BEFORE STATEMENT triggers at start, and AFTER STATEMENT triggers at end, and actually execute all non-deferred AFTER triggers, before we move on to executing the next WITH clause or the main query. regards, tom lane Index: src/backend/commands/explain.c === RCS file: /cvsroot/pgsql/src/backend/commands/explain.c,v retrieving revision 1.192 diff -c -r1.192 explain.c *** src/backend/commands/explain.c 12 Oct 2009 18:10:41 - 1.192 --- src/backend/commands/explain.c 28 Oct 2009 22:24:42 - *** *** 19,25 #include "commands/defrem.h" #include "commands/explain.h" #include "commands/prepare.h" - #include "commands/trigger.h" #include "executor/instrument.h" #include "optimizer/clauses.h" #include "optimizer/planner.h" --- 19,24 *** *** 354,363 INSTR_TIME_SET_CURRENT(starttime); - /* If analyzing, we need to cope with queued triggers */ - if (es->analyze) - AfterTriggerBeginQuery(); - /* Select execution options */ if (es->analyze) eflags = 0;/* default run-to-completion flags */ --- 353,358 *** *** 383,402 ExplainPrintPlan(es, queryDesc); /* ! * If we ran the command, run any AFTER triggers it queued. (Note this ! * will not include DEFERRED triggers; since those don't run until end of ! * transaction, we can't measure them.) Include into total runtime. */ if (es->analyze) { - INSTR_TIME_SET_CURRENT(starttime); - AfterTriggerEndQuery(queryDesc->estate); - totaltime += elapsed_time(&starttime); - } - - /* Print info about runtime of triggers */ - if (es->analyze) - { ResultRelInfo *rInfo; bool show_relname; int numrels = queryDesc->estate->es_num_result_relations; --- 378,389 ExplainPrintPlan(es, queryDesc); /* ! * Print info about runtime of triggers. (Note this will not include ! * DEFERRED triggers; since those don't run until end of transaction, we ! * can't measure them.) */ if (es->analyze) { ResultRelInfo *rInfo; bool show_relname; int numrels = queryDesc->estate->es_num_result_relations; Index: src/backend/commands/portalcmds.c === RCS file: /cvsroot/pgsql/src/backend/commands/portalcmds.c,v retrieving revision 1.81 diff -c -r1.81 portalcmds.c *** src/backend/commands/portalcmds.c 7 Oct 2009 16:27:18 - 1.81 --- src/backend/commands/portalcmds.c 28 Oct 2009 22:24:42 - *** *** 264,270 PG_TRY(); { CurrentResourceOwner = portal->resowner; - /* we do not need AfterTriggerEndQuery() here */ ExecutorEnd(queryDesc); FreeQueryDesc(queryDesc); } --- 264,269 *** *** 371,377 * Now shut down the inner executor. */ portal->queryDesc = NULL; /* prevent double shutdown */ - /* we do not need AfterTriggerEndQuery() here */ ExecutorEnd(queryDesc); FreeQueryDesc(queryDesc); --- 370,375 Index: src/backend/commands/trigger.c === RCS file: /cvsroot/pgsql/src/backend/commands/trigger.c,v retrieving revision 1.256 diff -c -r1.256 trigger.c *** src/backend/commands/trigger.c
Re: [HACKERS] Show schema size with \dn+
Le mercredi 28 octobre 2009 à 15:11:31, Anders Steinlein a écrit : > Is there any interest in expanding \dn+ to show schema size, similar > to table sizes using \dt+ in 8.4? We use separate schemas for each > user, so this would allow us to quickly look up the sizes of each > user's data. > > I have little experience with C and none with the PostgreSQL code base > -- where should I look to have a go at this? > I would say source file src/bin/psql/describe.c, function listSchemas. -- Guillaume. http://www.postgresqlfr.org http://dalibo.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Tue, 2009-10-27 at 00:38 -0400, Greg Smith wrote: > new feature One additional point that would be useful is a way to match up the usage of custom_variable_classes with this new style of .conf file processing. At the moment if you wish to add a custom variable class everybody needs to edit the *same* parameter. Finding which one to edit could be a little difficult with a whole directory to search in. I propose a new form of processing for that variable: each new parameter instance is added to last one, rather than replacing it. e.g. custom_variable_class = 'x' custom_variable_class = 'y' custom_variable_class = 'z' is equivalent to custom_variable_classes = 'x,y,z' That allows NewFeatureX to drop in a file called "newfeaturex.conf", which looks like this custom_variable_class = 'newfeaturex' newfeaturex.param1 = x newfeaturex.param2 = y newfeaturex.param3 = z This requires no editing of any other files, just a straight drop in. That will make it much easier to produce real installers/deinstallers for add-in modules. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
On Wed, 2009-10-28 at 15:31 -0400, Tom Lane wrote: > Hannu Krosing writes: > > I had never checked the docs for hash functions, but I had assumed, that > > internal functions are prefixed by pg_ and anything else is public, free > > to use functionality. > > Sure, it's free to use. It's not free to assume that we promise never > to change it. > > > Changing hash functions also makes in-place upgrades a lot harder, as > > they can't be done incrementally anymore for tables which use hash > > indexes. > > Hash indexes are so far from being production-grade that this argument > is not significant. AFAIK in-place upgrade is also not quite production-grade, so this was meant as a forward-looking note for next time the hashxxx functions will change. > regards, tom lane -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
Greg Smith wrote: On Wed, 28 Oct 2009, Josh Berkus wrote: It's the basic and unsolvable issue of how do you have a file which is both perfectly human-readable-and-editable *and* perfectly machine-readable-and-editable at the same time. Let's see...if I remember correctly from the last two rounds of this discussion, this is the point where someone pops up and says that switching to XML for the postgresql.conf will solve this problem. Whoever does that this time goes into the ring with Kevin and I, but they don't get a club. (All fight proceeds to benefit SPI of course). That's precisely why I didn't get into this discussion (you guys are scary). It really does seem like deja vu all over again. I'm usually good for a given argument once. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
On Wed, 2009-10-28 at 12:51 -0700, Jeff Davis wrote: > On Wed, 2009-10-28 at 21:09 +0200, Hannu Krosing wrote: > > Is at least the fact that they "are undocumented, have changed in the > > past, and are likely to change again in the future" documented ? > > That's a little confusing to me: how do we document that something is > undocumented? And where do we stop? My previous e-mail message documents that the undocumentedness is undocumented, so no need to go any further here ;) Though undocumented, the hash functions are easily discoverable by doing \df *hash* in psql > > Hashing is a quite fundamental thing in computing, so I was quite > > surprised to find out it had silently changed. > > There are many reasons to use a hash, and we don't want people to use > these functions for the wrong purpose. I don't think that not documenting a hash function helps here at all. > I have seen people use a > performance hash for security purposes before, and I had to demonstrate > some hash collisions to show why that was a bad idea. I've seen people use CRC32 as hash and then hit a collisions in 15 tries with quite large keys. > So, if we do provide documented functions, it should be done carefully. Any user-visible behavior change should be done carefully, even if the original behavior is not documented. Careful upgrade of hasxxx functions would have kept the old functions, and introduced the new ones with _v2 suffix, and then used these in appropriate places. then kept the old ones for a few versions, with maybe a deprecation warning and then moved them to contrib for a few more versions. Doing it this way could leave them "undocumented" and still not break peoples applications in mysterious ways. > Trying to develop and document a set of standardized, stable hash > functions covering a wide range of possible use cases sounds like it may > be better served by an extension. I guess there are enough security/crypt/strong hashes in pgcrypto package so that should not be a problem. -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, Oct 28, 2009 at 4:52 PM, Greg Smith wrote: > On Wed, 28 Oct 2009, Robert Haas wrote: > >> It would be completely logical to break up the configuration file into >> subfiles by TOPIC. That would complicate things for tool-writers >> because they would need to get each setting into the proper file, and >> we currently don't have any infrastructure for that. > > Already done: > > # select name,category from pg_settings limit 1; > name | category > --+--- > add_missing_from | Version and Platform Compatibility / Previous PostgreSQL > Versions > > You could make one per category, and pgtune for example already knows all > this info. The somewhat arbitrary category assignments Josh put things into > are what Peter was complaining about upthread. Questions like "is > 'effective_cache_size' a memory parameters or an optimizer one?" show why > this is not trivial to do well. I stand corrected. I think the basic thrust of the paragraph stands - this is not why people are asking for the feature, or if it is that hasn't been articulated or discussed. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, 28 Oct 2009, Robert Haas wrote: It would be completely logical to break up the configuration file into subfiles by TOPIC. That would complicate things for tool-writers because they would need to get each setting into the proper file, and we currently don't have any infrastructure for that. Already done: # select name,category from pg_settings limit 1; name | category --+--- add_missing_from | Version and Platform Compatibility / Previous PostgreSQL Versions You could make one per category, and pgtune for example already knows all this info. The somewhat arbitrary category assignments Josh put things into are what Peter was complaining about upthread. Questions like "is 'effective_cache_size' a memory parameters or an optimizer one?" show why this is not trivial to do well. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, Oct 28, 2009 at 4:24 PM, Josh Berkus wrote: > >> Let's see...if I remember correctly from the last two rounds of this >> discussion, this is the point where someone pops up and says that >> switching to XML for the postgresql.conf will solve this problem. >> Whoever does that this time goes into the ring with Kevin and I, but >> they don't get a club. (All fight proceeds to benefit SPI of course). > > XML is s last-week. JSON! > > Oh, wait, we're PostgreSQL, we're not that mainstream. YAML! Definitely ASN.1 ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha2 bundled -- please verify
On Wed, 2009-10-21 at 23:26 +0300, Peter Eisentraut wrote: > Alpha2 has been bundled and is available at > > http://developer.postgresql.org/~petere/alpha/ > > Please check that it is sane. Peter, could you please provide md5sum for alpha2 tarball? I know we missed in alpha1. It would be better if the original packager would upload the md5sum. Also, I'll be happy if you can also upload .bz2 file (for the lazy RPM packages who does not want to play with his spec files a lot). Regards, -- Devrim GÜNDÜZ, RHCE Command Prompt - http://www.CommandPrompt.com devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz signature.asc Description: This is a digitally signed message part
Re: [HACKERS] inefficient use of relation extension?
Tom Lane wrote: > Alvaro Herrera writes: > > I regularly (several times a day) see five or six processes all with > > pg_locks locktype=extend granted=f on the same table, waiting for a long > > time. > > I'm not sure what's causing that, but I *seriously* doubt that adding > new pages to FSM right away would make it better. What it sounds like > is someone is getting hung up while holding the lock. You should try to > investigate who's got the lock when this happens, and what they're doing > or waiting for. After some more research, these facts have arisen: - the relation extension lock in question is on a toast table - the entries stored in that table are long enough that they need more than one page - the BufMappingLock is seen as severely contended among the various processes trying to extend the table So the dozen+ processes fight the BufMapping lwlock while attempting to extend the toast table, to get a free buffer for the new page; do this one page at a time, and then go back to do the same thing over and over. Shared_buffers is large (4.6 GB) and I'm not clear if this is just inefficiency in the 8.1 bufmgr code, or that bgwriter is not aggresive enough about clearing pages from the LRU end of the buffer pool. I'm not really sure what the right way to attack this problem is, but getting off 8.1 has now gotten a priority. sinval was tracked too and we found out that it's not an issue at all. Thanks everyone. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
> Let's see...if I remember correctly from the last two rounds of this > discussion, this is the point where someone pops up and says that > switching to XML for the postgresql.conf will solve this problem. > Whoever does that this time goes into the ring with Kevin and I, but > they don't get a club. (All fight proceeds to benefit SPI of course). XML is s last-week. JSON! Oh, wait, we're PostgreSQL, we're not that mainstream. YAML! --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, Oct 28, 2009 at 3:28 PM, Tom Lane wrote: > Josh Berkus writes: >> Kevin, >>> Perhaps the ease of writing something like that with sed or perl has >>> caused me to underestimate the effort required in C. I am curious >>> whether you actually mean that, or said it for rhetorical effect. > >> I actually mean that. It *looks* easy in perl, and in fact *is* easy >> for *your* postgresql.conf which you control. But writing a parser for >> every postgresql.conf which exists in the world, no matter how someone >> has hacked it up creatively? No matter how they've handled upgrades? > > The issue isn't even with writing a parser. The conf file is certainly > machine-parsable; guc-file.l is an existence proof, and the relatively > slow rate of change of that file indicates that it's been a reasonably > stable format over time. Right. > The issue is that if you want to modify the > file while preserving comments, commenting out superseded entries, > putting new entries where the user would expect to find them, etc etc, > you have a hard AI problem in front of you. Right. In other words, it's not possible. You can drive yourself crazy trying to approximate it, but it is a hopeless waste of time. > This is why Robert keeps > harping on the default commentary being a problem --- if you removed all > comments (and didn't care about ordering etc), it would be easier. Yes - and we even had some consensus that this was a good idea, at one point. Maybe not complete, precise consensus on every detail, but certainly enough to have a conversation about it. > But short of telling people who prefer $EDITOR to get lost, that's not > going to work. This is where I get off the train. Preferring $EDITOR is not the same thing as feeling that we need 500 lines of comments in the default file. There may be some people who hold both opinions, of course. > I think the point of the discussions here is that we want to establish > an alternate config file (or set of config files) in which the > expectations are different: no promise to preserve any comments, no > intention to be human-friendly for editing, etc. In one sense this > would be the same machine-readable format, since the backend is still > going to parse it with guc-file.l; but changing the human expectations > can make the editing problem much simpler. I don't think this idea is without merit, but I don't think it's a silver bullet, either. If you can change the human expectations for some file that gets processed along with postgresql.conf, you can change the expectations for postgresql.conf itself. In fact, you'll have to: adding more files is BY DEFINITION going to change the interpretation of postgresql.conf. It will either be the magic bullet file that overrides the other file, or visca versa - rather than, as is the case in a default install today, being THE configuration file. One of the issues that we need to face is: how many new files? There seems to be some sentiment that we can just "drop in" new files and expect things to work. I think that's likely to lead to chaos. Having TWO files - one for $EDITOR and one for tools - may be manageable. There will be some user confusion as to which one is in charge, but there are only two choices, so maybe it's not too bad. But having a whole directory full of files emitted by different tools sounds like a disaster, and therefore it seems to me that there is no getting around the need to have a tool which can merge new settings into an existing configuration file. It would be completely logical to break up the configuration file into subfiles by TOPIC. That would complicate things for tool-writers because they would need to get each setting into the proper file, and we currently don't have any infrastructure for that. But that's not why people want this feature anyway. What they want is to be able to deposit new settings and have them take effect without parsing a config file. But they can do that today. Just open postgresql.conf for append, write a newline in case the file didn't already end with one, write your settings, and close the file. Your settings win because they are last. The problem is - now you've left a mess for someone else to clean up. Overtime duplicates will accumulate. The order of settings won't be preserved. Comments won't be properly updated. But writing to another file doesn't actually fix any of that. Merging settings (either in postgresql.conf or in a separate persistent.conf) does, at least in part. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, 28 Oct 2009, Josh Berkus wrote: It's the basic and unsolvable issue of how do you have a file which is both perfectly human-readable-and-editable *and* perfectly machine-readable-and-editable at the same time. Let's see...if I remember correctly from the last two rounds of this discussion, this is the point where someone pops up and says that switching to XML for the postgresql.conf will solve this problem. Whoever does that this time goes into the ring with Kevin and I, but they don't get a club. (All fight proceeds to benefit SPI of course). -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
On Wed, 2009-10-28 at 21:09 +0200, Hannu Krosing wrote: > Is at least the fact that they "are undocumented, have changed in the > past, and are likely to change again in the future" documented ? That's a little confusing to me: how do we document that something is undocumented? And where do we stop? > Hashing is a quite fundamental thing in computing, so I was quite > surprised to find out it had silently changed. There are many reasons to use a hash, and we don't want people to use these functions for the wrong purpose. I have seen people use a performance hash for security purposes before, and I had to demonstrate some hash collisions to show why that was a bad idea. So, if we do provide documented functions, it should be done carefully. Trying to develop and document a set of standardized, stable hash functions covering a wide range of possible use cases sounds like it may be better served by an extension. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
Kenneth Marshall writes: > On Wed, Oct 28, 2009 at 03:31:17PM -0400, Tom Lane wrote: >> Hash indexes are so far from being production-grade that this argument >> is not significant. > In addition that change from 8.3 -> 8.4 to store only the hash and not > the value in the index means that a reindex would be required in any event. Indeed, and I fully expect there will be some more on-disk format changes required before we get to the point where hash indexes are actually interesting for production. If we start insisting that they be in-place-upgradable now, we will pretty much guarantee that they never become useful enough to justify the restriction :-( (As examples, the hash bucket size probably needs revisiting, and we ought to think very hard about whether we shouldn't switch to 64-bit hash values. And that's not even considering some of the more advanced suggestions that have been made.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
On Wed, Oct 28, 2009 at 03:31:17PM -0400, Tom Lane wrote: > Hannu Krosing writes: > > I had never checked the docs for hash functions, but I had assumed, that > > internal functions are prefixed by pg_ and anything else is public, free > > to use functionality. > > Sure, it's free to use. It's not free to assume that we promise never > to change it. > > > Changing hash functions also makes in-place upgrades a lot harder, as > > they can't be done incrementally anymore for tables which use hash > > indexes. > > Hash indexes are so far from being production-grade that this argument > is not significant. > > regards, tom lane In addition that change from 8.3 -> 8.4 to store only the hash and not the value in the index means that a reindex would be required in any event. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
Hannu Krosing writes: > I had never checked the docs for hash functions, but I had assumed, that > internal functions are prefixed by pg_ and anything else is public, free > to use functionality. Sure, it's free to use. It's not free to assume that we promise never to change it. > Changing hash functions also makes in-place upgrades a lot harder, as > they can't be done incrementally anymore for tables which use hash > indexes. Hash indexes are so far from being production-grade that this argument is not significant. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
Josh Berkus writes: > Kevin, >> Perhaps the ease of writing something like that with sed or perl has >> caused me to underestimate the effort required in C. I am curious >> whether you actually mean that, or said it for rhetorical effect. > I actually mean that. It *looks* easy in perl, and in fact *is* easy > for *your* postgresql.conf which you control. But writing a parser for > every postgresql.conf which exists in the world, no matter how someone > has hacked it up creatively? No matter how they've handled upgrades? The issue isn't even with writing a parser. The conf file is certainly machine-parsable; guc-file.l is an existence proof, and the relatively slow rate of change of that file indicates that it's been a reasonably stable format over time. The issue is that if you want to modify the file while preserving comments, commenting out superseded entries, putting new entries where the user would expect to find them, etc etc, you have a hard AI problem in front of you. This is why Robert keeps harping on the default commentary being a problem --- if you removed all comments (and didn't care about ordering etc), it would be easier. But short of telling people who prefer $EDITOR to get lost, that's not going to work. I think the point of the discussions here is that we want to establish an alternate config file (or set of config files) in which the expectations are different: no promise to preserve any comments, no intention to be human-friendly for editing, etc. In one sense this would be the same machine-readable format, since the backend is still going to parse it with guc-file.l; but changing the human expectations can make the editing problem much simpler. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, Oct 28, 2009 at 12:08 PM, Josh Berkus wrote: >> Perhaps the ease of writing something like that with sed or perl has >> caused me to underestimate the effort required in C. I am curious >> whether you actually mean that, or said it for rhetorical effect. > > I actually mean that. It *looks* easy in perl, and in fact *is* easy > for *your* postgresql.conf which you control. But writing a parser for > every postgresql.conf which exists in the world, no matter how someone > has hacked it up creatively? No matter how they've handled upgrades? > For every version of PostgreSQL? That requires writing a full parser > with grammar and near-turing capabilities. I think we're getting distracted by the basic parsing questions. These are actually solvable -- pgadmin solves them today even. I think the bigger problem is the semantic questions. If I've lowered random_page_cost and your tool decides it should raise sequential_page_cost should it raise random_page_cost proportionally from my setting or to the absolute value it calculates? When it does will I be annoyed to see my settings overwritten? What if I set some of the cpu_* parameters based on my random_page_cost setting and now that you've overwritten my random_page_cost setting they're all out of whack? And not all programs editing these files will be equally intelligent. Say I post a snippet of configuration and say to drop it in wholesale into your postgresql.conf.d. Then the user runs autotune which drops a configuration in after it which overrides those settings. Then later I post an updated snippet and the user replaces the original snippet. If they're in separate files and he put the snippet in before the autotune configuration then he doesn't have to worry if the new snippet contains things which autotune overrode. They'll still override the new settings. If you keep them separate then the actual settings may not be in sync but at least I can see each group of settings and understand what they were trying to do. The precedence remains the same. It also means the database could give warnings if files are overriding earlier files if we want that. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
On Wed, 2009-02-11 at 11:22 -0500, Tom Lane wrote: > Asko Oja writes: > > Did this change hashtext() visible to users? We have been using it quite > > widely for partitioning our databases. If so then it should be marked quite > > visibly in release notes as there might be others who will be hit by this. > > The hash functions are undocumented, have changed in the past, and are > likely to change again in the future. If you are using them in a way > that depends on them to give the same answers across versions, you'd > better stop. Is at least the fact that they "are undocumented, have changed in the past, and are likely to change again in the future" documented ? I'm sure this is something that has hit unwary users in the past and will hit again in the future, so some words about it in the doc's would be appropriate. search for "hashtext" on http://www.postgresql.org/docs/8.4/interactive/index.html returned no results, so I guess even theyr "undocumented, will surprise you" status is not documented. Hashing is a quite fundamental thing in computing, so I was quite surprised to find out it had silently changed. I had never checked the docs for hash functions, but I had assumed, that internal functions are prefixed by pg_ and anything else is public, free to use functionality. Changing hash functions also makes in-place upgrades a lot harder, as they can't be done incrementally anymore for tables which use hash indexes. > regards, tom lane > -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
Kevin, > Perhaps the ease of writing something like that with sed or perl has > caused me to underestimate the effort required in C. I am curious > whether you actually mean that, or said it for rhetorical effect. I actually mean that. It *looks* easy in perl, and in fact *is* easy for *your* postgresql.conf which you control. But writing a parser for every postgresql.conf which exists in the world, no matter how someone has hacked it up creatively? No matter how they've handled upgrades? For every version of PostgreSQL? That requires writing a full parser with grammar and near-turing capabilities. > Well, I wouldn't vote against it since it seems to do me no harm; I > was just confused at the repeated assertion that update-in-place was > such a hard problem. It's the basic and unsolvable issue of how do you have a file which is both perfectly human-readable-and-editable *and* perfectly machine-readable-and-editable at the same time. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, Oct 28, 2009 at 10:28 AM, Greg Smith wrote: > The postgresql.conf file being modified is generated by initdb, and it's > already being customized per install by the initdb-time rules like detection > for maximum supported shared_buffers. It isn't one of the files installed by > the package manager where the logic you're describing kicks in. The > conflict case would show up, to use a RHEL example, if I edited a > /etc/sysconfig/postgresql file and then a changed version of that file > appeared upstream. Stuff in PGDATA is all yours and not tracked as a config > file. Well putting configuration files in PGDATA is itself a packaging violation. I'm talking about /etc/postgresql.conf. Yes it's possible for packages to simply opt out of the configuration file management which at least means they're not actively causing problems -- but it's a cheat, it means it's giving up on providing the user with useful upgrades of configuration files. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
Greg Smith writes: > If as you say the only right way to do this is to use the flex logic, that > just reinforced how high the bar is for someone who wants to write a tool > that modifies the file. Yup, exactly. Personally I think that trying to auto-modify postgresql.conf is insane. The whole and entire reason behind this discussion is that we want the tools modifying OTHER files, for which we will establish different and much simpler rules for what is allowed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
Josh Berkus wrote: > The precedence issues you (and Robert) are citing are no different > from what we have currently in a single file. I think that's *why* we're mentioning it. This would seem to be the juncture to look for ways to improve that, not just settle for "no worse" -- but perhaps that's not possible. > If someone here thinks writing a tool which reliably parses and > re-writes a hand-written PostgresQL.conf and runs on all the OSes we > support is *easy*, then please write it for me! I'll happly use > such a tool. But after wasting a couple dozen hours on the problem, > I won't write one. Perhaps the ease of writing something like that with sed or perl has caused me to underestimate the effort required in C. I am curious whether you actually mean that, or said it for rhetorical effect. > Otherwise, please let us have our directory so that we can > experiment with easy-to-write-and-revise autoconfig tools. Well, I wouldn't vote against it since it seems to do me no harm; I was just confused at the repeated assertion that update-in-place was such a hard problem. Some of the people saying that seem to regularly eat problems which seem much harder than that (to me, anyway) for lunch. That seemed to suggest there could be other reasons for wanting the directory approach which weren't getting proper focus. If we solve the wrong problem, the solution is likely to be suboptimal for the real issues. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, 28 Oct 2009, Greg Stark wrote: It's also a blatant violation of packaging rules for Debian if not every distribution. If you edit the user's configuration file then there's no way to install a modified default configuration file. You can't tell the automatic modifications apart from the user's modifications. So the user will get a prompt asking if he wants the new config file or to keep his modifications which he never remembered making. The postgresql.conf file being modified is generated by initdb, and it's already being customized per install by the initdb-time rules like detection for maximum supported shared_buffers. It isn't one of the files installed by the package manager where the logic you're describing kicks in. The conflict case would show up, to use a RHEL example, if I edited a /etc/sysconfig/postgresql file and then a changed version of that file appeared upstream. Stuff in PGDATA is all yours and not tracked as a config file. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, 28 Oct 2009, Tom Lane wrote: Why in the world are you looking at initdb? The standard reference for postgresql.conf-reading code, by definition, is guc-file.l. I think the odds of building something that works right, without borrowing that same flex logic, are about nil. initdb was the only sample around that actually makes changes to the postgresql.conf. It's also a nice simple standalone program that's easy to borrow pieces from, which guc-file.l is not. That's the reason it looks tempting at first. If as you say the only right way to do this is to use the flex logic, that just reinforced how high the bar is for someone who wants to write a tool that modifies the file. Periodically we get people who show up saying "hey, I'd like to write a little [web|cli|gui] tool to help people update their postgresql.conf file", and when the answer they get incudes "first you need to implement this grammar..." that's scares off almost all of them. It didn't work on me because I used to write compilers for fun before flex existed. But even I just skimmed it and pragmatically wrote a simpler postgresql.conf parser implementation that worked well enough to get a working prototype out the door, rather than properly the whole grammar. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
Kevin, > I'm talking about how the decision should be made as to which takes > precedence. It's fine to document which one *was* chosen, but that > doesn't eliminate the problem of conflicting settings making a mess. > Someone else (Robert maybe?) gave an explicit example of how three > files could have overlapping settings. Of course, *my* tool will name > its configuration file "!.conf". Hey, if a DBA wants to do that, then it's fine with me. They can check pg_settings afterwards to find out which was chosen. The precedence issues you (and Robert) are citing are no different from what we have currently in a single file. I absolutely can't tell you the number of hacked-up postgresql.conf files I've seen with the same setting appearing in more than 3 places. And with the conf file being over 1000 lines long, it's easy to miss that someone or some tool added another instance of the variable at the bottom. Plus we already support includes of single files. Why is an include of a directory controversial? If someone doesn't want to use it, they don't have to. If someone here thinks writing a tool which reliably parses and re-writes a hand-written PostgresQL.conf and runs on all the OSes we support is *easy*, then please write it for me! I'll happly use such a tool. But after wasting a couple dozen hours on the problem, I won't write one. Otherwise, please let us have our directory so that we can experiment with easy-to-write-and-revise autoconfig tools. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where's the docs?
On 10/27/09 9:25 PM, Tom Lane wrote: > Josh Berkus writes: >> OK, this is the genuine failure; the syntax is missing for column triggers: > >> CREATE TRIGGER name { BEFORE | AFTER } { event [ OR ... ] } >> ON table [ FOR [ EACH ] { ROW | STATEMENT } ] >> EXECUTE PROCEDURE function_name ( arguments ) > > It's embedded in "event", which isn't spelled out here. > Yeah, I couldn't figure it out from the docs, which means that other people won't be able to. Doc patch coming if I ever finish this server migration. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, Oct 28, 2009 at 2:37 AM, Dimitri Fontaine wrote: > That's why I'm proposing the following API at file level: That's exactly the same as putting them all in the same file, only a different syntax. It still requires that any program understand what every other program was trying to do. >> It's much simpler and more reliable to have each program generate a >> separate file. > > On the viewpoint of the program itself only. For the DBA, that soon > becomes a nightmare because the same GUC could come from any number of > tools and the precedence rules, even explicit and as easy as > alphanumeric orderding (which locale already?), make it error prone. But the DBA *wants* to control those precedence rules. The automatic software certainly can't unless they know what other automatic software exists in the world -- or will exist in the future. > I really want to insist on having only ONE location for settings from > tools (all of them) and one location for manual/local editing. > >> time it's generated. It doesn't have to worry about anything else >> parsing or making sense of the file except the database server itself. > > But it'll never know if the settings it just generated are superseded by > some other tool's configuration file. That's precisely what makes things simpler. The less each module has to know about each other module the simpler and more reliable it will be. I actually would suggest that they check the current "source" by checking with postgres, just to give a warning. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, Oct 28, 2009 at 7:33 AM, Alvaro Herrera wrote: > Greg Smith escribió: > >> This sounds familiar...oh, that's right, this is almost the same >> algorithm pgtune uses. And it sucks, It's also a blatant violation of packaging rules for Debian if not every distribution. If you edit the user's configuration file then there's no way to install a modified default configuration file. You can't tell the automatic modifications apart from the user's modifications. So the user will get a prompt asking if he wants the new config file or to keep his modifications which he never remembered making. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
Greg Smith writes: > The sketched out design I have for a contrib/pgtune in C presumes that I'd > start by refactoring the relevant bits from initdb into a library for both > programs to use. But the initdb code doesn't care about preserving > existing values when making changes to them; it just throws in its new > settings and moves along. So what's there already only handles about half > the annoying parts most people would expect a tuning tool that reads the > existing file and operates on it to do. > Also, I wouldn't be surprised to find that it chokes on some real-world > postgresql.conf files. The postgresql.conf.sample it's being fed is > fairly pristine. Indeed. Why in the world are you looking at initdb? The standard reference for postgresql.conf-reading code, by definition, is guc-file.l. I think the odds of building something that works right, without borrowing that same flex logic, are about nil. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, 28 Oct 2009, Alvaro Herrera wrote: Huh, isn't this code in initdb.c already? The sketched out design I have for a contrib/pgtune in C presumes that I'd start by refactoring the relevant bits from initdb into a library for both programs to use. But the initdb code doesn't care about preserving existing values when making changes to them; it just throws in its new settings and moves along. So what's there already only handles about half the annoying parts most people would expect a tuning tool that reads the existing file and operates on it to do. Also, I wouldn't be surprised to find that it chokes on some real-world postgresql.conf files. The postgresql.conf.sample it's being fed is fairly pristine. A tuning tool that intends to read any postgresql.conf it's fed can't always assume it's in exactly standard form. I've recently started collecting complicated postgresql.conf lines that crashed my Python code as people submit bug reports with those. You might be surprised at all of the places people put whitespace at. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] FOR UPDATE versus outer joins
While I'm fooling with the FOR UPDATE code ... Currently, you can't apply FOR UPDATE to a relation that's on the inner side of an outer join, eg regression=# select * from a left join b using(aa) for update; ERROR: SELECT FOR UPDATE/SHARE cannot be applied to the nullable side of an outer join It would be a trivial code change to make this work by just not locking any row in b when a null-extended join row is formed. The argument against that was that FOR UPDATE is supposed to guarantee that the same rows can be fetched again, and it's not clear that the same null-extended row would be formed if the join were repeated. In particular, if the a row's join key has been changed by a concurrent update, we would still return the null-extended row, but there might now be rows in b that it can join to; which we won't find since we aren't repeating the whole join but only rechecking particular join pairs. Do people still find that argument convincing, or would it be better to remove the restriction and let the code do the best it can? It seems to me that allowing FOR UPDATE with outer joins may be more useful than not allowing it, even given the caveat. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Show schema size with \dn+
Is there any interest in expanding \dn+ to show schema size, similar to table sizes using \dt+ in 8.4? We use separate schemas for each user, so this would allow us to quickly look up the sizes of each user's data. I have little experience with C and none with the PostgreSQL code base -- where should I look to have a go at this? -- a. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
Alvaro Herrera wrote: > Kevin Grittner wrote: > >> But I think that's where the rub is -- when you >> have more than one source for information, what's the precedence? > > This is not a problem nowadays because that info is in pg_settings. > File name and line number. I'm talking about how the decision should be made as to which takes precedence. It's fine to document which one *was* chosen, but that doesn't eliminate the problem of conflicting settings making a mess. Someone else (Robert maybe?) gave an explicit example of how three files could have overlapping settings. Of course, *my* tool will name its configuration file "!.conf". -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
Alvaro Herrera wrote: Greg Smith escribió: I was thinking that the algorithm would be something like: "Read the old postgresql.conf and write it back out to a new file line by line This sounds familiar...oh, that's right, this is almost the same algorithm pgtune uses. And it sucks, and it's a pain to covert the tool into C because of it, and the fact that you have to write this sort of boring code before you can do a single line of productive work is one reason why we don't have more tools available; way too much painful grunt work to write. Huh, isn't this code in initdb.c already? Since it's BSD-licensed (or is it MIT?) you could just have lifted it. Surely this isn't the reason the tool isn't written in C. In any case, initdb has to be in C for portability reasons (I'm more aware of this than most ;-) ), but other tools don't unless the server has to rely on them. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
Kevin Grittner wrote: > But I think that's where the rub is -- when you > have more than one source for information, what's the precedence? This is not a problem nowadays because that info is in pg_settings. File name and line number. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
Greg Smith escribió: > >I was thinking that the algorithm would be something like: "Read > >the old postgresql.conf and write it back out to a new file line > >by line > > This sounds familiar...oh, that's right, this is almost the same > algorithm pgtune uses. And it sucks, and it's a pain to covert the > tool into C because of it, and the fact that you have to write this > sort of boring code before you can do a single line of productive > work is one reason why we don't have more tools available; way too > much painful grunt work to write. Huh, isn't this code in initdb.c already? Since it's BSD-licensed (or is it MIT?) you could just have lifted it. Surely this isn't the reason the tool isn't written in C. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
Forgive me for jumping in again on discussion of a feature I might never use, but as an "outside observer" something doesn't make sense to me. Josh Berkus wrote: > If you require that a tool (or SET PERISTENT) parse through a file > in order to change one setting, then you've just doubled or tripled > the code size of the tool, as well as added a host of failure > conditions which wouldn't have existed otherwise. Not if there is one implementation of which is distributed with PostgreSQL. Give it a clean API and a command-line application (for scripting in non-C languages) and this is a non-issue. This really seems like a red herring. I know it would be more lines in C than a bash script; but really, think about how little work this would be for any script which has grep and sed available -- at least if you assume it shouldn't follow include statements. But I think that's where the rub is -- when you have more than one source for information, what's the precedence? That question doesn't go away with the proposed feature. It seems that in reading this thread I've seen a lot of conflicting notions on how it *should* work, with a handwavy assertion that it doesn't matter because the DBA can sort it all out. But then will the tools always do what people expect? It seems like there's a significant base of users who want their database product to self-configure; and there's clearly a significant base of professional DBAs who want to be able to hand-tune for a variety of reasons. I assume that addressing these disparate needs is one of the goals here? As well as an easy way to drop in configuration for additional features? The directory seems to make sense for the latter, but seems horrible to me for the former. It turns the risk of a spaghetti configuration file into a sorcerer's apprentice collection of competing, conflicting files which are a worse mess that the spaghetti. Perhaps there should be two separate features for the two separate use cases? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))
KaiGai Kohei escribió: > Alvaro Herrera wrote: > >Now, let's assume that COPY data includes the security context for each > >tuple in the output. > > When we support row-level security, it will be necessary to backup and > restore the security context for each tuples. Oh, right, that part is being left out. Sorry. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))
Alvaro Herrera wrote: KaiGai Kohei escribió: There are two cases when we create a new object. 1) create a new object without any explicit security context. If we don't have statement support, it is the only case. In this case, SELinux computes a default security context to be assigned on the new object. It depends on the client's security context. Then, it checks "create" permission on a pair of the client's security context and the default security context. If not allowed, an error will be raised. So, following this path, it is possible to write pg_dump support without a explicit security contexts: you have to make pg_dump write out the different tuples under different users. So you'd have more than one data object in the dump output for each table, one for every existing security context. This seems extremely difficult and intrusive however. It seems that having explicit security contexts in statements is necessary for this area to be reasonably simple. Yes, it may be possible to restore the tables without statement, if we switch OS-user's privilege for each tables, but unreasonable and unrealistic. Now, let's assume that COPY data includes the security context for each tuple in the output. When we support row-level security, it will be necessary to backup and restore the security context for each tuples. What I'm talking about is how we specify the security context of the new tables. If we can have statement support, it will be specified as follows: CREATE TABLE t ( a int primary key, b text) SECURITY_CONTEXT = 'system_u:object_r:sepgsql_ro_table_t:unclassified'; How is that data restored? Would you need to grant super-SEPostgres privileges to the user restoring the data? We need to restore the backup by the user who has privileges to create database objects dumped at least. But no needs to have super-privilege. For example, if all the dumped table are labeled as either "unclassified" or "classified" but not "secret", all the needed privilege is to create "unclassified" and "classified" tables, not "secret" table. However, I expect that "unconfined" domain does the backup/restore works expect for especially secure system. I don't think the default security policy (such as ones bundled with Fedora) should restrict DBA's privileges connected from the shell process. Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where's the docs?
Peter Eisentraut writes: > On Wed, 2009-10-28 at 00:25 -0400, Tom Lane wrote: >> Josh Berkus writes: >>> OK, this is the genuine failure; the syntax is missing for column triggers: >>> CREATE TRIGGER name { BEFORE | AFTER } { event [ OR ... ] } >>> ON table [ FOR [ EACH ] { ROW | STATEMENT } ] >>> EXECUTE PROCEDURE function_name ( arguments ) >> It's embedded in "event", which isn't spelled out here. > I know this is a bit suboptimal, but I couldn't think of a better way > without cluttering up to many things. Expanding "event" in-place definitely wouldn't be very readable. The only alternative that seems sane is to do something like what we do in SELECT: CREATE TRIGGER ... where "event" is: INSERT | UPDATE [ ( column [,...] ) ] | DELETE | TRUNCATE I am not convinced that it's worth it, but maybe. Certainly the current psql "\h CREATE TRIGGER" display is not helpful at reminding you where to put the column names. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))
KaiGai Kohei escribió: > There are two cases when we create a new object. > > 1) create a new object without any explicit security context. > If we don't have statement support, it is the only case. > In this case, SELinux computes a default security context to be assigned > on the new object. It depends on the client's security context. > Then, it checks "create" permission on a pair of the client's security > context and the default security context. If not allowed, an error will > be raised. So, following this path, it is possible to write pg_dump support without a explicit security contexts: you have to make pg_dump write out the different tuples under different users. So you'd have more than one data object in the dump output for each table, one for every existing security context. This seems extremely difficult and intrusive however. It seems that having explicit security contexts in statements is necessary for this area to be reasonably simple. Now, let's assume that COPY data includes the security context for each tuple in the output. How is that data restored? Would you need to grant super-SEPostgres privileges to the user restoring the data? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))
Heikki Linnakangas wrote: KaiGai Kohei wrote: Heikki Linnakangas wrote: KaiGai Kohei wrote: Robert Haas wrote: 2009/10/27 KaiGai Kohei : - no statement support to specify security context. (It makes impossible to add support in pg_dump. Is it really OK?) I doubt that anything without pg_dump support would be even vaguely OK... In my previous experience, it enabled to reduce 300-400 lines of the patch. But here is no more sense than the 300-400 lines. In my honest, I like to include a feature to specify an explicit security context in the patch from the begining. (It also allows to attach test cases with more variations.) Can you explain why that's required for pg_dump support? I was thinking that there would be no explicit security labels on objects, and permissions would be checked based on other inherent properties of the object, like owner, name, schema etc. In SELinux model, security context is the only property which can be used to decision making based on the security policy. It never uses any other properties, like owner, name, ... The security context doesn't necessary need to be given explicitly. Things like network ports, files in filesystems that don't support security labels are assigned a security context based on some external policy. Hmm, I guess the whole feature becomes completely pointless if all objects always have their default labels, and can't be changed. So I guess we need that. I think this discussion started when I wondered why we can't put the SE-pgsql check for creating an object (e.g table) into pg_namespace_aclcheck() without changing the signature. The reason you gave is that we need the security context of the new table being created to decide if creating such a table is allowed. But assuming that the new table inherits the security context of the schema it's created in, pg_namespace_aclcheck() *does* have all the necessary information: it knows the namespace which determines the new object's security context. As long as we don't provide syntax to define the security context in the CREATE command, we're fine, even if there's an ALTER command to change the security context of the object after the creation. What I pointed out is just a part of matters if we try to deploy SE-PgSQL hooks within aclchk.c routines. For example, pg_namespace_aclcheck() with ACL_CREATE is not only invoked just before creation of a new table. It is also called when we create a new function, type, conversion and so on. For example, pg_namespace_aclcheck() does not take an argument to deliver the column definitions of new table. When columns are inherited from the parent table, we have to copy the security context of the parent column, but we can know the column's definition inside of the pg_namespace_aclcheck(). (It needs to be called after MergeAttributes(), but pg_namespace_aclcheck() is called before that.) For example, SE-PgSQL model distinguish "setattr" permission from "drop". But pg_class_ownercheck() is used for both ALTER and DROP statement. So, we cannot know which permission should be applied inside from the pg_class_ownercheck(). For example, ... At the first commit fest, I was suggested to change definitions of the default PG access control routines to deliver needed information for both DAC and MAC, if pg_xxx_aclcheck() is not suitable for SELinux model. Then, I developed a junk in the result. :( I'm not sure how much of a difference that detail makes in the big scheme of things, I'm just trying to find ways to make the patch minimally invasive.. Basically, I don't think we should change something pg_xxx_aclcheck() and pg_xxx_ownercheck() routines, because it well implements the default PG model. If we try to call DAC and MAC from the common entry points, it requires us many of pain, as we could learn from our hard experience. What I would like to suggest is to put MAC hook on the strategic points. The hooks are just invocations of sepgsql_*() functions, so does not need much of reworks on the core routines. I believe this is the minimally invasive way. Linux kernel is one of the best practice. It deploys hooks to call MAC checks with needed information (such as inode, task_struct, ...) on the strategic points of the kernel. Basically, DAC and MAC works orthogonally, so it is quite natural design. Its specifications are documented in the source code clearly, so folks without special attentions for security also can know what information should be given and what result will be returned. What I would like to suggest is a similar approach. So, now I'm working to write a documentation from the viewpoint of developer, and coding SE-PgSQL routines with comments about its specifications. Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))
KaiGai Kohei wrote: > Heikki Linnakangas wrote: >> KaiGai Kohei wrote: >>> Robert Haas wrote: 2009/10/27 KaiGai Kohei : > - no statement support to specify security context. > (It makes impossible to add support in pg_dump. Is it really OK?) I doubt that anything without pg_dump support would be even vaguely OK... >>> In my previous experience, it enabled to reduce 300-400 lines of the patch. >>> But here is no more sense than the 300-400 lines. >>> >>> In my honest, I like to include a feature to specify an explicit security >>> context in the patch from the begining. >>> (It also allows to attach test cases with more variations.) >> Can you explain why that's required for pg_dump support? I was thinking >> that there would be no explicit security labels on objects, and >> permissions would be checked based on other inherent properties of the >> object, like owner, name, schema etc. > > In SELinux model, security context is the only property which can be > used to decision making based on the security policy. > It never uses any other properties, like owner, name, ... The security context doesn't necessary need to be given explicitly. Things like network ports, files in filesystems that don't support security labels are assigned a security context based on some external policy. Hmm, I guess the whole feature becomes completely pointless if all objects always have their default labels, and can't be changed. So I guess we need that. I think this discussion started when I wondered why we can't put the SE-pgsql check for creating an object (e.g table) into pg_namespace_aclcheck() without changing the signature. The reason you gave is that we need the security context of the new table being created to decide if creating such a table is allowed. But assuming that the new table inherits the security context of the schema it's created in, pg_namespace_aclcheck() *does* have all the necessary information: it knows the namespace which determines the new object's security context. As long as we don't provide syntax to define the security context in the CREATE command, we're fine, even if there's an ALTER command to change the security context of the object after the creation. I'm not sure how much of a difference that detail makes in the big scheme of things, I'm just trying to find ways to make the patch minimally invasive.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Tue, Oct 27, 2009 at 11:40 PM, Josh Berkus wrote: > On 10/27/09 8:24 PM, Robert Haas wrote: >> read the old postgresql.conf and >> write it back out to a new file line by line. If, in the process of >> doing this, you find a setting for the variable you're trying to >> change, then write out the new line in place of the original line. > > You've hit the problem on the head right there. The requirement to do > something like that is *exactly* the problem which makes writing > config-management tools hard/impossible. > > If you require that a tool (or SET PERISTENT) parse through a file in > order to change one setting, then you've just doubled or tripled the > code size of the tool, as well as added a host of failure conditions > which wouldn't have existed otherwise. I think you're just trading one set of failure conditions for another. Now instead of having one unparseable configuration file you're going to have a whole pile of them with possibly-conflicting settings. > You're hearing from the people who are working on tools: requiring that > any tool parse a hand-written config file is a non-starter. Yep: and I'm baffled by that, because I understand neither why it's hard nor what the reasonable alternatives are. The algorithm I just proposed can be implemented by a very short Perl script. But my bafflement doesn't (and isn't intended to) prevent others from implementing what they like. As Tom is fond of saying (and it's 10x more true of me), I'm not the only vote here. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
Greg Stark writes: > On Tue, Oct 27, 2009 at 8:40 PM, Josh Berkus wrote: >> You're hearing from the people who are working on tools: requiring that >> any tool parse a hand-written config file is a non-starter. > > It can be done, pgadmin actually does it currently. But I totally > agree it's a bad idea. > > But the difficulty of parsing the handwritten stuff is not the only > reason it's a bad idea. Any time you have multiple pieces of software, > to say nothing of humans, editing the same file you're going to have > headaches. They need to agree on everything and be able to handle > anything any other program generates. Such a file would be a kind of > API itself. That's why I'm proposing the following API at file level: - 1 file per GUC - file name is {class.}guc_name.conf - first line only contains value of setting - rest of the file contains comments Now any tool can see current value for itself, and change it, keeping the old one as comment is easy too: $ myguc=`cat postgresql.conf.d/my_guc.conf` $ (echo newvalue; echo $myguc) > postgresql.conf.d/my_guc.conf Furthermore, extensions are required to use a custom class, so they will need to edit custom_variable_classes then their own files. Any tool could support editing those files too, it's rather easy until you want to provide specific wizard kind knowledge to the user. A dedicated facility to add a new class to custom_variable_classes GUC could be devised later, but doesn't feel like it's in this patch playground. > It's much simpler and more reliable to have each program generate a > separate file. On the viewpoint of the program itself only. For the DBA, that soon becomes a nightmare because the same GUC could come from any number of tools and the precedence rules, even explicit and as easy as alphanumeric orderding (which locale already?), make it error prone. I really want to insist on having only ONE location for settings from tools (all of them) and one location for manual/local editing. > time it's generated. It doesn't have to worry about anything else > parsing or making sense of the file except the database server itself. But it'll never know if the settings it just generated are superseded by some other tool's configuration file. With my proposal the SET PERSISTENT command can easily warn user: as soon as current source for the GUC is NOT postgresql.conf.d you know you're not affecting anything, it's been the DBA choice to manually set something else. It if happens you are the DBA, you can go edit postgresql.conf to comment out the GUC and enjoy your new tool suite. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))
Heikki Linnakangas wrote: > KaiGai Kohei wrote: >> Robert Haas wrote: >>> 2009/10/27 KaiGai Kohei : - no statement support to specify security context. (It makes impossible to add support in pg_dump. Is it really OK?) >>> I doubt that anything without pg_dump support would be even vaguely OK... >> In my previous experience, it enabled to reduce 300-400 lines of the patch. >> But here is no more sense than the 300-400 lines. >> >> In my honest, I like to include a feature to specify an explicit security >> context in the patch from the begining. >> (It also allows to attach test cases with more variations.) > > Can you explain why that's required for pg_dump support? I was thinking > that there would be no explicit security labels on objects, and > permissions would be checked based on other inherent properties of the > object, like owner, name, schema etc. In SELinux model, security context is the only property which can be used to decision making based on the security policy. It never uses any other properties, like owner, name, ... There are two cases when we create a new object. 1) create a new object without any explicit security context. If we don't have statement support, it is the only case. In this case, SELinux computes a default security context to be assigned on the new object. It depends on the client's security context. Then, it checks "create" permission on a pair of the client's security context and the default security context. If not allowed, an error will be raised. 2) create a new object with an explicit security context. In this case, the given explicit security context will be assigned. SELinux checks "create" permission on a pair of the client's security context and the given explicit security context. If not allowed, an error will be raised. Please note that SELinux assigns a security context on the managed object in either cases. If we don't have any statement support, there are no way to specify an explicit security context on the new object in creation. It also means we cannot recover the security context of objects correctly. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))
KaiGai Kohei wrote: > Robert Haas wrote: >> 2009/10/27 KaiGai Kohei : >>> - no statement support to specify security context. >>> (It makes impossible to add support in pg_dump. Is it really OK?) >> I doubt that anything without pg_dump support would be even vaguely OK... > > In my previous experience, it enabled to reduce 300-400 lines of the patch. > But here is no more sense than the 300-400 lines. > > In my honest, I like to include a feature to specify an explicit security > context in the patch from the begining. > (It also allows to attach test cases with more variations.) Can you explain why that's required for pg_dump support? I was thinking that there would be no explicit security labels on objects, and permissions would be checked based on other inherent properties of the object, like owner, name, schema etc. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers