date:20091028

Re: [HACKERS] Where's the docs?

2009-10-28 Thread Peter Eisentraut

On Wed, 2009-10-28 at 00:25 -0400, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
  OK, this is the genuine failure; the syntax is missing for column triggers:
 
  CREATE TRIGGER name { BEFORE | AFTER } { event [ OR ... ] }
  ON table [ FOR [ EACH ] { ROW | STATEMENT } ]
  EXECUTE PROCEDURE function_name ( arguments )
 
 It's embedded in event, which isn't spelled out here.

I know this is a bit suboptimal, but I couldn't think of a better way
without cluttering up to many things.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))

2009-10-28 Thread Heikki Linnakangas

KaiGai Kohei wrote:
 Robert Haas wrote:
 2009/10/27 KaiGai Kohei kai...@ak.jp.nec.com:
 - no statement support to specify security context.
  (It makes impossible to add support in pg_dump. Is it really OK?)
 I doubt that anything without pg_dump support would be even vaguely OK...
 
 In my previous experience, it enabled to reduce 300-400 lines of the patch.
 But here is no more sense than the 300-400 lines.
 
 In my honest, I like to include a feature to specify an explicit security
 context in the patch from the begining.
 (It also allows to attach test cases with more variations.)

Can you explain why that's required for pg_dump support? I was thinking
that there would be no explicit security labels on objects, and
permissions would be checked based on other inherent properties of the
object, like owner, name, schema etc.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))

2009-10-28 Thread KaiGai Kohei

Heikki Linnakangas wrote:
 KaiGai Kohei wrote:
 Robert Haas wrote:
 2009/10/27 KaiGai Kohei kai...@ak.jp.nec.com:
 - no statement support to specify security context.
  (It makes impossible to add support in pg_dump. Is it really OK?)
 I doubt that anything without pg_dump support would be even vaguely OK...
 In my previous experience, it enabled to reduce 300-400 lines of the patch.
 But here is no more sense than the 300-400 lines.

 In my honest, I like to include a feature to specify an explicit security
 context in the patch from the begining.
 (It also allows to attach test cases with more variations.)
 
 Can you explain why that's required for pg_dump support? I was thinking
 that there would be no explicit security labels on objects, and
 permissions would be checked based on other inherent properties of the
 object, like owner, name, schema etc.

In SELinux model, security context is the only property which can be
used to decision making based on the security policy.
It never uses any other properties, like owner, name, ...

There are two cases when we create a new object.

1) create a new object without any explicit security context.
If we don't have statement support, it is the only case.
In this case, SELinux computes a default security context to be assigned
on the new object. It depends on the client's security context.
Then, it checks create permission on a pair of the client's security
context and the default security context. If not allowed, an error will
be raised.

2) create a new object with an explicit security context.
In this case, the given explicit security context will be assigned.
SELinux checks create permission on a pair of the client's security
context and the given explicit security context. If not allowed, an error
will be raised.

Please note that SELinux assigns a security context on the managed object
in either cases.

If we don't have any statement support, there are no way to specify
an explicit security context on the new object in creation.
It also means we cannot recover the security context of objects correctly.

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Dimitri Fontaine

Greg Stark gsst...@mit.edu writes:
 On Tue, Oct 27, 2009 at 8:40 PM, Josh Berkus j...@agliodbs.com wrote:
 You're hearing from the people who are working on tools: requiring that
 any tool parse a hand-written config file is a non-starter.

 It can be done, pgadmin actually does it currently. But I totally
 agree it's a bad idea.

 But the difficulty of parsing the handwritten stuff is not the only
 reason it's a bad idea. Any time you have multiple pieces of software,
 to say nothing of humans, editing the same file you're going to have
 headaches. They need to agree on everything and be able to handle
 anything any other program generates. Such a file would be a kind of
 API itself.

That's why I'm proposing the following API at file level:
 - 1 file per GUC
 - file name is {class.}guc_name.conf
 - first line only contains value of setting
 - rest of the file contains comments

Now any tool can see current value for itself, and change it, keeping
the old one as comment is easy too:
 $ myguc=`cat postgresql.conf.d/my_guc.conf`
 $ (echo newvalue; echo $myguc)  postgresql.conf.d/my_guc.conf

Furthermore, extensions are required to use a custom class, so they will
need to edit custom_variable_classes then their own files. Any tool
could support editing those files too, it's rather easy until you want
to provide specific wizard kind knowledge to the user.

A dedicated facility to add a new class to custom_variable_classes GUC
could be devised later, but doesn't feel like it's in this patch
playground.

 It's much simpler and more reliable to have each program generate a
 separate file. 

On the viewpoint of the program itself only. For the DBA, that soon
becomes a nightmare because the same GUC could come from any number of
tools and the precedence rules, even explicit and as easy as
alphanumeric orderding (which locale already?), make it error prone.

I really want to insist on having only ONE location for settings from
tools (all of them) and one location for manual/local editing.

 time it's generated. It doesn't have to worry about anything else
 parsing or making sense of the file except the database server itself.

But it'll never know if the settings it just generated are superseded by
some other tool's configuration file. With my proposal the SET
PERSISTENT command can easily warn user: as soon as current source for
the GUC is NOT postgresql.conf.d you know you're not affecting anything,
it's been the DBA choice to manually set something else.

It if happens you are the DBA, you can go edit postgresql.conf to
comment out the GUC and enjoy your new tool suite.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Robert Haas

On Tue, Oct 27, 2009 at 11:40 PM, Josh Berkus j...@agliodbs.com wrote:
 On 10/27/09 8:24 PM, Robert Haas wrote:
 read the old postgresql.conf and
 write it back out to a new file line by line.  If, in the process of
 doing this, you find a setting for the variable you're trying to
 change, then write out the new line in place of the original line.

 You've hit the problem on the head right there.  The requirement to do
 something like that is *exactly* the problem which makes writing
 config-management tools hard/impossible.

 If you require that a tool (or SET PERISTENT) parse through a file in
 order to change one setting, then you've just doubled or tripled the
 code size of the tool, as well as added a host of failure conditions
 which wouldn't have existed otherwise.

I think you're just trading one set of failure conditions for another.
 Now instead of having one unparseable configuration file you're going
to have a whole pile of them with possibly-conflicting settings.

 You're hearing from the people who are working on tools: requiring that
 any tool parse a hand-written config file is a non-starter.

Yep: and I'm baffled by that, because I understand neither why it's
hard nor what the reasonable alternatives are.  The algorithm I just
proposed can be implemented by a very short Perl script.  But my
bafflement doesn't (and isn't intended to) prevent others from
implementing what they like.  As Tom is fond of saying (and it's 10x
more true of me), I'm not the only vote here.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))

2009-10-28 Thread Heikki Linnakangas

KaiGai Kohei wrote:
 Heikki Linnakangas wrote:
 KaiGai Kohei wrote:
 Robert Haas wrote:
 2009/10/27 KaiGai Kohei kai...@ak.jp.nec.com:
 - no statement support to specify security context.
  (It makes impossible to add support in pg_dump. Is it really OK?)
 I doubt that anything without pg_dump support would be even vaguely OK...
 In my previous experience, it enabled to reduce 300-400 lines of the patch.
 But here is no more sense than the 300-400 lines.

 In my honest, I like to include a feature to specify an explicit security
 context in the patch from the begining.
 (It also allows to attach test cases with more variations.)
 Can you explain why that's required for pg_dump support? I was thinking
 that there would be no explicit security labels on objects, and
 permissions would be checked based on other inherent properties of the
 object, like owner, name, schema etc.
 
 In SELinux model, security context is the only property which can be
 used to decision making based on the security policy.
 It never uses any other properties, like owner, name, ...

The security context doesn't necessary need to be given explicitly.
Things like network ports, files in filesystems that don't support
security labels are assigned a security context based on some external
policy.

Hmm, I guess the whole feature becomes completely pointless if all
objects always have their default labels, and can't be changed. So I
guess we need that.

I think this discussion started when I wondered why we can't put the
SE-pgsql check for creating an object (e.g table) into
pg_namespace_aclcheck() without changing the signature. The reason you
gave is that we need the security context of the new table being created
to decide if creating such a table is allowed. But assuming that the new
table inherits the security context of the schema it's created in,
pg_namespace_aclcheck() *does* have all the necessary information: it
knows the namespace which determines the new object's security context.
As long as we don't provide syntax to define the security context in the
CREATE command, we're fine, even if there's an ALTER command to change
the security context of the object after the creation.

I'm not sure how much of a difference that detail makes in the big
scheme of things, I'm just trying to find ways to make the patch
minimally invasive..

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))

2009-10-28 Thread KaiGai Kohei


Heikki Linnakangas wrote:

KaiGai Kohei wrote:

Heikki Linnakangas wrote:

KaiGai Kohei wrote:

Robert Haas wrote:

2009/10/27 KaiGai Kohei kai...@ak.jp.nec.com:

- no statement support to specify security context.
 (It makes impossible to add support in pg_dump. Is it really OK?)

I doubt that anything without pg_dump support would be even vaguely OK...

In my previous experience, it enabled to reduce 300-400 lines of the patch.
But here is no more sense than the 300-400 lines.

In my honest, I like to include a feature to specify an explicit security
context in the patch from the begining.
(It also allows to attach test cases with more variations.)

Can you explain why that's required for pg_dump support? I was thinking
that there would be no explicit security labels on objects, and
permissions would be checked based on other inherent properties of the
object, like owner, name, schema etc.

In SELinux model, security context is the only property which can be
used to decision making based on the security policy.
It never uses any other properties, like owner, name, ...


The security context doesn't necessary need to be given explicitly.
Things like network ports, files in filesystems that don't support
security labels are assigned a security context based on some external
policy.

Hmm, I guess the whole feature becomes completely pointless if all
objects always have their default labels, and can't be changed. So I
guess we need that.

I think this discussion started when I wondered why we can't put the
SE-pgsql check for creating an object (e.g table) into
pg_namespace_aclcheck() without changing the signature. The reason you
gave is that we need the security context of the new table being created
to decide if creating such a table is allowed. But assuming that the new
table inherits the security context of the schema it's created in,
pg_namespace_aclcheck() *does* have all the necessary information: it
knows the namespace which determines the new object's security context.
As long as we don't provide syntax to define the security context in the
CREATE command, we're fine, even if there's an ALTER command to change
the security context of the object after the creation.


What I pointed out is just a part of matters if we try to deploy SE-PgSQL
hooks within aclchk.c routines.

For example, pg_namespace_aclcheck() with ACL_CREATE is not only invoked
just before creation of a new table. It is also called when we create
a new function, type, conversion and so on.

For example, pg_namespace_aclcheck() does not take an argument to deliver
the column definitions of new table. When columns are inherited from the
parent table, we have to copy the security context of the parent column,
but we can know the column's definition inside of the pg_namespace_aclcheck().
(It needs to be called after MergeAttributes(), but pg_namespace_aclcheck()
is called before that.)

For example, SE-PgSQL model distinguish setattr permission from drop.
But pg_class_ownercheck() is used for both ALTER and DROP statement.
So, we cannot know which permission should be applied inside from the
pg_class_ownercheck().

For example, ...

At the first commit fest, I was suggested to change definitions of the default
PG access control routines to deliver needed information for both DAC and MAC,
if pg_xxx_aclcheck() is not suitable for SELinux model.
Then, I developed a junk in the result. :(


I'm not sure how much of a difference that detail makes in the big
scheme of things, I'm just trying to find ways to make the patch
minimally invasive..


Basically, I don't think we should change something pg_xxx_aclcheck() and
pg_xxx_ownercheck() routines, because it well implements the default PG model.
If we try to call DAC and MAC from the common entry points, it requires us
many of pain, as we could learn from our hard experience.

What I would like to suggest is to put MAC hook on the strategic points.
The hooks are just invocations of sepgsql_*() functions, so does not need
much of reworks on the core routines.
I believe this is the minimally invasive way.

Linux kernel is one of the best practice. It deploys hooks to call MAC checks
with needed information (such as inode, task_struct, ...) on the strategic
points of the kernel. Basically, DAC and MAC works orthogonally, so it is quite
natural design.

Its specifications are documented in the source code clearly, so folks without
special attentions for security also can know what information should be given
and what result will be returned.
What I would like to suggest is a similar approach. So, now I'm working to write
a documentation from the viewpoint of developer, and coding SE-PgSQL routines
with comments about its specifications.

Thanks,
--
KaiGai Kohei kai...@kaigai.gr.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))

2009-10-28 Thread Alvaro Herrera

KaiGai Kohei escribió:

 There are two cases when we create a new object.
 
 1) create a new object without any explicit security context.
 If we don't have statement support, it is the only case.
 In this case, SELinux computes a default security context to be assigned
 on the new object. It depends on the client's security context.
 Then, it checks create permission on a pair of the client's security
 context and the default security context. If not allowed, an error will
 be raised.

So, following this path, it is possible to write pg_dump support without
a explicit security contexts: you have to make pg_dump write out the
different tuples under different users.  So you'd have more than one
data object in the dump output for each table, one for every existing
security context.  This seems extremely difficult and intrusive however.

It seems that having explicit security contexts in statements is
necessary for this area to be reasonably simple.

Now, let's assume that COPY data includes the security context for each
tuple in the output.  How is that data restored?  Would you need to
grant super-SEPostgres privileges to the user restoring the data?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Where's the docs?

2009-10-28 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 On Wed, 2009-10-28 at 00:25 -0400, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 OK, this is the genuine failure; the syntax is missing for column triggers:

 CREATE TRIGGER name { BEFORE | AFTER } { event [ OR ... ] }
 ON table [ FOR [ EACH ] { ROW | STATEMENT } ]
 EXECUTE PROCEDURE function_name ( arguments )

 It's embedded in event, which isn't spelled out here.

 I know this is a bit suboptimal, but I couldn't think of a better way
 without cluttering up to many things.

Expanding event in-place definitely wouldn't be very readable.  The
only alternative that seems sane is to do something like what we do
in SELECT:

CREATE TRIGGER ...

where event is:

INSERT | UPDATE [ ( column [,...] ) ] | DELETE | TRUNCATE

I am not convinced that it's worth it, but maybe.  Certainly the current
psql \h CREATE TRIGGER display is not helpful at reminding you where
to put the column names.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))

2009-10-28 Thread KaiGai Kohei


Alvaro Herrera wrote:

KaiGai Kohei escribió:


There are two cases when we create a new object.

1) create a new object without any explicit security context.
If we don't have statement support, it is the only case.
In this case, SELinux computes a default security context to be assigned
on the new object. It depends on the client's security context.
Then, it checks create permission on a pair of the client's security
context and the default security context. If not allowed, an error will
be raised.


So, following this path, it is possible to write pg_dump support without
a explicit security contexts: you have to make pg_dump write out the
different tuples under different users.  So you'd have more than one
data object in the dump output for each table, one for every existing
security context.  This seems extremely difficult and intrusive however.

It seems that having explicit security contexts in statements is
necessary for this area to be reasonably simple.


Yes, it may be possible to restore the tables without statement, if we switch
OS-user's privilege for each tables, but unreasonable and unrealistic.


Now, let's assume that COPY data includes the security context for each
tuple in the output.


When we support row-level security, it will be necessary to backup and
restore the security context for each tuples.
What I'm talking about is how we specify the security context of the new
tables. If we can have statement support, it will be specified as follows:

  CREATE TABLE t ( a int primary key, b text)
  SECURITY_CONTEXT = 'system_u:object_r:sepgsql_ro_table_t:unclassified';


How is that data restored?  Would you need to
grant super-SEPostgres privileges to the user restoring the data?


We need to restore the backup by the user who has privileges to create
database objects dumped at least. But no needs to have super-privilege.

For example, if all the dumped table are labeled as either unclassified
or classified but not secret, all the needed privilege is to create
unclassified and classified tables, not secret table.

However, I expect that unconfined domain does the backup/restore works
expect for especially secure system. I don't think the default security
policy (such as ones bundled with Fedora) should restrict DBA's privileges
connected from the shell process.

Thanks,
--
KaiGai Kohei kai...@kaigai.gr.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))

2009-10-28 Thread Alvaro Herrera

KaiGai Kohei escribió:
 Alvaro Herrera wrote:

 Now, let's assume that COPY data includes the security context for each
 tuple in the output.
 
 When we support row-level security, it will be necessary to backup and
 restore the security context for each tuples.

Oh, right, that part is being left out.  Sorry.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Kevin Grittner

Forgive me for jumping in again on discussion of a feature I might
never use, but as an outside observer something doesn't make sense
to me.
 
Josh Berkus j...@agliodbs.com wrote:
 
 If you require that a tool (or SET PERISTENT) parse through a file
 in order to change one setting, then you've just doubled or tripled
 the code size of the tool, as well as added a host of failure
 conditions which wouldn't have existed otherwise.
 
Not if there is one implementation of which is distributed with
PostgreSQL.  Give it a clean API and a command-line application (for
scripting in non-C languages) and this is a non-issue.  This really
seems like a red herring.
 
I know it would be more lines in C than a bash script; but really,
think about how little work this would be for any script which has
grep and sed available -- at least if you assume it shouldn't follow
include statements.  But I think that's where the rub is -- when you
have more than one source for information, what's the precedence? 
That question doesn't go away with the proposed feature.  It seems
that in reading this thread I've seen a lot of conflicting notions on
how it *should* work, with a handwavy assertion that it doesn't matter
because the DBA can sort it all out.  But then will the tools always
do what people expect?
 
It seems like there's a significant base of users who want their
database product to self-configure; and there's clearly a significant
base of professional DBAs who want to be able to hand-tune for a
variety of reasons.  I assume that addressing these disparate needs is
one of the goals here?  As well as an easy way to drop in
configuration for additional features?  The directory seems to make
sense for the latter, but seems horrible to me for the former.  It
turns the risk of a spaghetti configuration file into a sorcerer's
apprentice collection of competing, conflicting files which are a
worse mess that the spaghetti.
 
Perhaps there should be two separate features for the two separate use
cases?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Alvaro Herrera

Greg Smith escribió:

 I was thinking that the algorithm would be something like: Read
 the old postgresql.conf and write it back out to a new file line
 by line
 
 This sounds familiar...oh, that's right, this is almost the same
 algorithm pgtune uses.  And it sucks, and it's a pain to covert the
 tool into C because of it, and the fact that you have to write this
 sort of boring code before you can do a single line of productive
 work is one reason why we don't have more tools available; way too
 much painful grunt work to write.

Huh, isn't this code in initdb.c already?  Since it's BSD-licensed (or
is it MIT?) you could just have lifted it.  Surely this isn't the reason
the tool isn't written in C.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Alvaro Herrera

Kevin Grittner wrote:

 But I think that's where the rub is -- when you
 have more than one source for information, what's the precedence? 

This is not a problem nowadays because that info is in pg_settings.
File name and line number.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Andrew Dunstan




Alvaro Herrera wrote:

Greg Smith escribió:

  

I was thinking that the algorithm would be something like: Read
the old postgresql.conf and write it back out to a new file line
by line
  

This sounds familiar...oh, that's right, this is almost the same
algorithm pgtune uses.  And it sucks, and it's a pain to covert the
tool into C because of it, and the fact that you have to write this
sort of boring code before you can do a single line of productive
work is one reason why we don't have more tools available; way too
much painful grunt work to write.



Huh, isn't this code in initdb.c already?  Since it's BSD-licensed (or
is it MIT?) you could just have lifted it.  Surely this isn't the reason
the tool isn't written in C.

  


In any case, initdb has to be in C for portability reasons (I'm more 
aware of this than most ;-) ), but other tools don't unless the server 
has to rely on them.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Kevin Grittner

Alvaro Herrera alvhe...@commandprompt.com wrote:
 Kevin Grittner wrote:
 
 But I think that's where the rub is -- when you
 have more than one source for information, what's the precedence? 
 
 This is not a problem nowadays because that info is in pg_settings.
 File name and line number.
 
I'm talking about how the decision should be made as to which takes
precedence.  It's fine to document which one *was* chosen, but that
doesn't eliminate the problem of conflicting settings making a mess. 
Someone else (Robert maybe?) gave an explicit example of how three
files could have overlapping settings.  Of course, *my* tool will name
its configuration file !.conf.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Show schema size with \dn+

2009-10-28 Thread Anders Steinlein

Is there any interest in expanding \dn+ to show schema size, similar  
to table sizes using \dt+ in 8.4? We use separate schemas for each  
user, so this would allow us to quickly look up the sizes of each  
user's data.


I have little experience with C and none with the PostgreSQL code base  
-- where should I look to have a go at this?


-- a.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] FOR UPDATE versus outer joins

2009-10-28 Thread Tom Lane

While I'm fooling with the FOR UPDATE code ...

Currently, you can't apply FOR UPDATE to a relation that's on the inner
side of an outer join, eg

regression=# select * from a left join b using(aa) for update;
ERROR:  SELECT FOR UPDATE/SHARE cannot be applied to the nullable side of an 
outer join

It would be a trivial code change to make this work by just not locking
any row in b when a null-extended join row is formed.  The argument
against that was that FOR UPDATE is supposed to guarantee that the same
rows can be fetched again, and it's not clear that the same
null-extended row would be formed if the join were repeated.  In
particular, if the a row's join key has been changed by a concurrent
update, we would still return the null-extended row, but there might now
be rows in b that it can join to; which we won't find since we aren't
repeating the whole join but only rechecking particular join pairs.

Do people still find that argument convincing, or would it be better to
remove the restriction and let the code do the best it can?  It seems to
me that allowing FOR UPDATE with outer joins may be more useful than not
allowing it, even given the caveat.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Greg Smith


On Wed, 28 Oct 2009, Alvaro Herrera wrote:


Huh, isn't this code in initdb.c already?


The sketched out design I have for a contrib/pgtune in C presumes that I'd 
start by refactoring the relevant bits from initdb into a library for both 
programs to use.  But the initdb code doesn't care about preserving 
existing values when making changes to them; it just throws in its new 
settings and moves along.  So what's there already only handles about half 
the annoying parts most people would expect a tuning tool that reads the 
existing file and operates on it to do.


Also, I wouldn't be surprised to find that it chokes on some real-world 
postgresql.conf files.  The postgresql.conf.sample it's being fed is 
fairly pristine.  A tuning tool that intends to read any postgresql.conf 
it's fed can't always assume it's in exactly standard form.  I've recently 
started collecting complicated postgresql.conf lines that crashed my 
Python code as people submit bug reports with those.  You might be 
surprised at all of the places people put whitespace at.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Tom Lane

Greg Smith gsm...@gregsmith.com writes:
 The sketched out design I have for a contrib/pgtune in C presumes that I'd 
 start by refactoring the relevant bits from initdb into a library for both 
 programs to use.  But the initdb code doesn't care about preserving 
 existing values when making changes to them; it just throws in its new 
 settings and moves along.  So what's there already only handles about half 
 the annoying parts most people would expect a tuning tool that reads the 
 existing file and operates on it to do.

 Also, I wouldn't be surprised to find that it chokes on some real-world 
 postgresql.conf files.  The postgresql.conf.sample it's being fed is 
 fairly pristine.

Indeed.  Why in the world are you looking at initdb?  The standard
reference for postgresql.conf-reading code, by definition, is
guc-file.l.  I think the odds of building something that works right,
without borrowing that same flex logic, are about nil.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Greg Stark

On Wed, Oct 28, 2009 at 7:33 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Greg Smith escribió:

 This sounds familiar...oh, that's right, this is almost the same
 algorithm pgtune uses.  And it sucks,

It's also a blatant violation of packaging rules for Debian if not
every distribution. If you edit the user's configuration file then
there's no way to install a modified default configuration file. You
can't tell the automatic modifications apart from the user's
modifications. So the user will get a prompt asking if he wants the
new config file or to keep his modifications which he never remembered
making.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Greg Stark

On Wed, Oct 28, 2009 at 2:37 AM, Dimitri Fontaine
dfonta...@hi-media.com wrote:
 That's why I'm proposing the following API at file level:

That's exactly the same as putting them all in the same file, only a
different syntax. It still requires that any program understand what
every other program was trying to do.

 It's much simpler and more reliable to have each program generate a
 separate file.

 On the viewpoint of the program itself only. For the DBA, that soon
 becomes a nightmare because the same GUC could come from any number of
 tools and the precedence rules, even explicit and as easy as
 alphanumeric orderding (which locale already?), make it error prone.

But the DBA *wants* to control those precedence rules. The automatic
software certainly can't unless they know what other automatic
software exists in the world -- or will exist in the future.

 I really want to insist on having only ONE location for settings from
 tools (all of them) and one location for manual/local editing.

 time it's generated. It doesn't have to worry about anything else
 parsing or making sense of the file except the database server itself.

 But it'll never know if the settings it just generated are superseded by
 some other tool's configuration file.

That's precisely what makes things simpler. The less each module has
to know about each other module the simpler and more reliable it will
be. I actually would suggest that they check the current source by
checking with postgres, just to give a warning.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Where's the docs?

2009-10-28 Thread Josh Berkus

On 10/27/09 9:25 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 OK, this is the genuine failure; the syntax is missing for column triggers:
 
 CREATE TRIGGER name { BEFORE | AFTER } { event [ OR ... ] }
 ON table [ FOR [ EACH ] { ROW | STATEMENT } ]
 EXECUTE PROCEDURE function_name ( arguments )
 
 It's embedded in event, which isn't spelled out here.
 

Yeah, I couldn't figure it out from the docs, which means that other
people won't be able to.  Doc patch coming if I ever finish this server
migration.

--Josh Berkus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Josh Berkus

Kevin,

 I'm talking about how the decision should be made as to which takes
 precedence.  It's fine to document which one *was* chosen, but that
 doesn't eliminate the problem of conflicting settings making a mess. 
 Someone else (Robert maybe?) gave an explicit example of how three
 files could have overlapping settings.  Of course, *my* tool will name
 its configuration file !.conf.

Hey, if a DBA wants to do that, then it's fine with me.  They can check
pg_settings afterwards to find out which was chosen.

The precedence issues you (and Robert) are citing are no different from
what we have currently in a single file.  I absolutely can't tell you
the number of hacked-up postgresql.conf files I've seen with the same
setting appearing in more than 3 places.  And with the conf file being
over 1000 lines long, it's easy to miss that someone or some tool added
another instance of the variable at the bottom.

Plus we already support includes of single files.  Why is an include of
a directory controversial?  If someone doesn't want to use it, they
don't have to.

If someone here thinks writing a tool which reliably parses and
re-writes a hand-written PostgresQL.conf and runs on all the OSes we
support is *easy*, then please write it for me!  I'll happly use such a
tool.  But after wasting a couple dozen hours on the problem, I won't
write one.

Otherwise, please let us have our directory so that we can experiment
with easy-to-write-and-revise autoconfig tools.

--Josh Berkus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Greg Smith


On Wed, 28 Oct 2009, Tom Lane wrote:

Why in the world are you looking at initdb?  The standard reference for 
postgresql.conf-reading code, by definition, is guc-file.l.  I think the 
odds of building something that works right, without borrowing that same 
flex logic, are about nil.


initdb was the only sample around that actually makes changes to the 
postgresql.conf.  It's also a nice simple standalone program that's easy 
to borrow pieces from, which guc-file.l is not.  That's the reason it 
looks tempting at first.


If as you say the only right way to do this is to use the flex logic, that 
just reinforced how high the bar is for someone who wants to write a tool 
that modifies the file.  Periodically we get people who show up saying 
hey, I'd like to write a little [web|cli|gui] tool to help people update 
their postgresql.conf file, and when the answer they get incudes first 
you need to implement this grammar... that's scares off almost all of 
them.  It didn't work on me because I used to write compilers for fun 
before flex existed.  But even I just skimmed it and pragmatically wrote a 
simpler postgresql.conf parser implementation that worked well enough to 
get a working prototype out the door, rather than properly the whole 
grammar.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Greg Smith


On Wed, 28 Oct 2009, Greg Stark wrote:


It's also a blatant violation of packaging rules for Debian if not
every distribution. If you edit the user's configuration file then
there's no way to install a modified default configuration file. You
can't tell the automatic modifications apart from the user's
modifications. So the user will get a prompt asking if he wants the
new config file or to keep his modifications which he never remembered
making.


The postgresql.conf file being modified is generated by initdb, and it's 
already being customized per install by the initdb-time rules like 
detection for maximum supported shared_buffers. It isn't one of the files 
installed by the package manager where the logic you're describing kicks 
in.  The conflict case would show up, to use a RHEL example, if I edited a 
/etc/sysconfig/postgresql file and then a changed version of that file 
appeared upstream.  Stuff in PGDATA is all yours and not tracked as a 
config file.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Kevin Grittner

Josh Berkus j...@agliodbs.com wrote:
 
 The precedence issues you (and Robert) are citing are no different
 from what we have currently in a single file.
 
I think that's *why* we're mentioning it.  This would seem to be the
juncture to look for ways to improve that, not just settle for no
worse -- but perhaps that's not possible.
 
 If someone here thinks writing a tool which reliably parses and
 re-writes a hand-written PostgresQL.conf and runs on all the OSes we
 support is *easy*, then please write it for me!  I'll happly use
 such a tool.  But after wasting a couple dozen hours on the problem,
 I won't write one.
 
Perhaps the ease of writing something like that with sed or perl has
caused me to underestimate the effort required in C.  I am curious
whether you actually mean that, or said it for rhetorical effect.
 
 Otherwise, please let us have our directory so that we can
 experiment with easy-to-write-and-revise autoconfig tools.
 
Well, I wouldn't vote against it since it seems to do me no harm; I
was just confused at the repeated assertion that update-in-place was
such a hard problem.  Some of the people saying that seem to regularly
eat problems which seem much harder than that (to me, anyway) for
lunch.  That seemed to suggest there could be other reasons for
wanting the directory approach which weren't getting proper focus.  If
we solve the wrong problem, the solution is likely to be suboptimal
for the real issues.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Tom Lane

Greg Smith gsm...@gregsmith.com writes:
 If as you say the only right way to do this is to use the flex logic, that 
 just reinforced how high the bar is for someone who wants to write a tool 
 that modifies the file.

Yup, exactly.  Personally I think that trying to auto-modify
postgresql.conf is insane.  The whole and entire reason behind this
discussion is that we want the tools modifying OTHER files, for which
we will establish different and much simpler rules for what is allowed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Greg Stark

On Wed, Oct 28, 2009 at 10:28 AM, Greg Smith gsm...@gregsmith.com wrote:
 The postgresql.conf file being modified is generated by initdb, and it's
 already being customized per install by the initdb-time rules like detection
 for maximum supported shared_buffers. It isn't one of the files installed by
 the package manager where the logic you're describing kicks in.  The
 conflict case would show up, to use a RHEL example, if I edited a
 /etc/sysconfig/postgresql file and then a changed version of that file
 appeared upstream.  Stuff in PGDATA is all yours and not tracked as a config
 file.

Well putting configuration files in PGDATA is itself a packaging
violation. I'm talking about /etc/postgresql.conf. Yes it's possible
for packages to simply opt out of the configuration file management
which at least means they're not actively causing problems -- but it's
a cheat, it means it's giving up on providing the user with useful
upgrades of configuration files.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Josh Berkus

Kevin,

 Perhaps the ease of writing something like that with sed or perl has
 caused me to underestimate the effort required in C.  I am curious
 whether you actually mean that, or said it for rhetorical effect.

I actually mean that.  It *looks* easy in perl, and in fact *is* easy
for *your* postgresql.conf which you control.  But writing a parser for
every postgresql.conf which exists in the world, no matter how someone
has hacked it up creatively?  No matter how they've handled upgrades?
For every version of PostgreSQL?  That requires writing a full parser
with grammar and near-turing capabilities.

 Well, I wouldn't vote against it since it seems to do me no harm; I
 was just confused at the repeated assertion that update-in-place was
 such a hard problem. 

It's the basic and unsolvable issue of how do you have a file which is
both perfectly human-readable-and-editable *and* perfectly
machine-readable-and-editable at the same time.

--Josh Berkus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-28 Thread Hannu Krosing

On Wed, 2009-02-11 at 11:22 -0500, Tom Lane wrote:
 Asko Oja asc...@gmail.com writes:
  Did this change hashtext() visible to users? We have been using it quite
  widely for partitioning our databases. If so then it should be marked quite
  visibly in release notes as there might be others who will be hit by this.
 
 The hash functions are undocumented, have changed in the past, and are
 likely to change again in the future.  If you are using them in a way
 that depends on them to give the same answers across versions, you'd
 better stop.

Is at least the fact that they are undocumented, have changed in the
past, and are likely to change again in the future documented ?

I'm sure this is something that has hit unwary users in the past and
will hit again in the future, so some words about it in the doc's would
be appropriate.

search for hashtext on
http://www.postgresql.org/docs/8.4/interactive/index.html returned no
results, so I guess even theyr undocumented, will surprise you status
is not documented.

Hashing is a quite fundamental thing in computing, so I was quite
surprised to find out it had silently changed. 

I had never checked the docs for hash functions, but I had assumed, that
internal functions are prefixed by pg_ and anything else is public, free
to use functionality.

Changing hash functions also makes in-place upgrades a lot harder, as
they can't be done incrementally anymore for tables which use hash
indexes.


   regards, tom lane
 
-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Greg Stark

On Wed, Oct 28, 2009 at 12:08 PM, Josh Berkus j...@agliodbs.com wrote:
 Perhaps the ease of writing something like that with sed or perl has
 caused me to underestimate the effort required in C.  I am curious
 whether you actually mean that, or said it for rhetorical effect.

 I actually mean that.  It *looks* easy in perl, and in fact *is* easy
 for *your* postgresql.conf which you control.  But writing a parser for
 every postgresql.conf which exists in the world, no matter how someone
 has hacked it up creatively?  No matter how they've handled upgrades?
 For every version of PostgreSQL?  That requires writing a full parser
 with grammar and near-turing capabilities.

I think we're getting distracted by the basic parsing questions. These
are actually solvable -- pgadmin solves them today even.

I think the bigger problem is the semantic questions. If I've lowered
random_page_cost and your tool decides it should raise
sequential_page_cost should it raise random_page_cost proportionally
from my setting or to the absolute value it calculates? When it does
will I be annoyed to see my settings overwritten? What if I set some
of the cpu_* parameters based on my random_page_cost setting and now
that you've overwritten my random_page_cost setting they're all out of
whack?

And not all programs editing these files will be equally intelligent.
Say I post a snippet of configuration and say to drop it in wholesale
into your postgresql.conf.d. Then the user runs autotune which drops a
configuration in after it which overrides those settings. Then later I
post an updated snippet and the user replaces the original snippet. If
they're in separate files and he put the snippet in before the
autotune configuration then he doesn't have to worry if the new
snippet contains things which autotune overrode. They'll still
override the new settings.

If you keep them separate then the actual settings may not be in sync
but at least I can see each group of settings and understand what they
were trying to do.  The precedence remains the same. It also means the
database could give warnings if files are overriding earlier files if
we want that.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Tom Lane

Josh Berkus j...@agliodbs.com writes:
 Kevin,
 Perhaps the ease of writing something like that with sed or perl has
 caused me to underestimate the effort required in C.  I am curious
 whether you actually mean that, or said it for rhetorical effect.

 I actually mean that.  It *looks* easy in perl, and in fact *is* easy
 for *your* postgresql.conf which you control.  But writing a parser for
 every postgresql.conf which exists in the world, no matter how someone
 has hacked it up creatively?  No matter how they've handled upgrades?

The issue isn't even with writing a parser.  The conf file is certainly
machine-parsable; guc-file.l is an existence proof, and the relatively
slow rate of change of that file indicates that it's been a reasonably
stable format over time.  The issue is that if you want to modify the
file while preserving comments, commenting out superseded entries,
putting new entries where the user would expect to find them, etc etc,
you have a hard AI problem in front of you.  This is why Robert keeps
harping on the default commentary being a problem --- if you removed all
comments (and didn't care about ordering etc), it would be easier.
But short of telling people who prefer $EDITOR to get lost, that's not
going to work.

I think the point of the discussions here is that we want to establish
an alternate config file (or set of config files) in which the
expectations are different: no promise to preserve any comments, no
intention to be human-friendly for editing, etc.  In one sense this
would be the same machine-readable format, since the backend is still
going to parse it with guc-file.l; but changing the human expectations
can make the editing problem much simpler.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-28 Thread Tom Lane

Hannu Krosing ha...@2ndquadrant.com writes:
 I had never checked the docs for hash functions, but I had assumed, that
 internal functions are prefixed by pg_ and anything else is public, free
 to use functionality.

Sure, it's free to use.  It's not free to assume that we promise never
to change it.

 Changing hash functions also makes in-place upgrades a lot harder, as
 they can't be done incrementally anymore for tables which use hash
 indexes.

Hash indexes are so far from being production-grade that this argument
is not significant.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-28 Thread Kenneth Marshall

On Wed, Oct 28, 2009 at 03:31:17PM -0400, Tom Lane wrote:
 Hannu Krosing ha...@2ndquadrant.com writes:
  I had never checked the docs for hash functions, but I had assumed, that
  internal functions are prefixed by pg_ and anything else is public, free
  to use functionality.
 
 Sure, it's free to use.  It's not free to assume that we promise never
 to change it.
 
  Changing hash functions also makes in-place upgrades a lot harder, as
  they can't be done incrementally anymore for tables which use hash
  indexes.
 
 Hash indexes are so far from being production-grade that this argument
 is not significant.
 
   regards, tom lane

In addition that change from 8.3 - 8.4 to store only the hash and not
the value in the index means that a reindex would be required in any event.

Cheers,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-28 Thread Tom Lane

Kenneth Marshall k...@rice.edu writes:
 On Wed, Oct 28, 2009 at 03:31:17PM -0400, Tom Lane wrote:
 Hash indexes are so far from being production-grade that this argument
 is not significant.

 In addition that change from 8.3 - 8.4 to store only the hash and not
 the value in the index means that a reindex would be required in any event.

Indeed, and I fully expect there will be some more on-disk format
changes required before we get to the point where hash indexes are
actually interesting for production.  If we start insisting that they
be in-place-upgradable now, we will pretty much guarantee that they
never become useful enough to justify the restriction :-(

(As examples, the hash bucket size probably needs revisiting,
and we ought to think very hard about whether we shouldn't switch
to 64-bit hash values.  And that's not even considering some of the
more advanced suggestions that have been made.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-28 Thread Jeff Davis

On Wed, 2009-10-28 at 21:09 +0200, Hannu Krosing wrote:
 Is at least the fact that they are undocumented, have changed in the
 past, and are likely to change again in the future documented ?

That's a little confusing to me: how do we document that something is
undocumented? And where do we stop?

 Hashing is a quite fundamental thing in computing, so I was quite
 surprised to find out it had silently changed. 

There are many reasons to use a hash, and we don't want people to use
these functions for the wrong purpose. I have seen people use a
performance hash for security purposes before, and I had to demonstrate
some hash collisions to show why that was a bad idea. So, if we do
provide documented functions, it should be done carefully.

Trying to develop and document a set of standardized, stable hash
functions covering a wide range of possible use cases sounds like it may
be better served by an extension.

Regards,
Jeff Davis 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Greg Smith


On Wed, 28 Oct 2009, Josh Berkus wrote:


It's the basic and unsolvable issue of how do you have a file which is
both perfectly human-readable-and-editable *and* perfectly
machine-readable-and-editable at the same time.


Let's see...if I remember correctly from the last two rounds of this 
discussion, this is the point where someone pops up and says that 
switching to XML for the postgresql.conf will solve this problem. 
Whoever does that this time goes into the ring with Kevin and I, but they 
don't get a club.  (All fight proceeds to benefit SPI of course).


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Robert Haas

On Wed, Oct 28, 2009 at 3:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Josh Berkus j...@agliodbs.com writes:
 Kevin,
 Perhaps the ease of writing something like that with sed or perl has
 caused me to underestimate the effort required in C.  I am curious
 whether you actually mean that, or said it for rhetorical effect.

 I actually mean that.  It *looks* easy in perl, and in fact *is* easy
 for *your* postgresql.conf which you control.  But writing a parser for
 every postgresql.conf which exists in the world, no matter how someone
 has hacked it up creatively?  No matter how they've handled upgrades?

 The issue isn't even with writing a parser.  The conf file is certainly
 machine-parsable; guc-file.l is an existence proof, and the relatively
 slow rate of change of that file indicates that it's been a reasonably
 stable format over time.

Right.

 The issue is that if you want to modify the
 file while preserving comments, commenting out superseded entries,
 putting new entries where the user would expect to find them, etc etc,
 you have a hard AI problem in front of you.

Right.  In other words, it's not possible.  You can drive yourself
crazy trying to approximate it, but it is a hopeless waste of time.

 This is why Robert keeps
 harping on the default commentary being a problem --- if you removed all
 comments (and didn't care about ordering etc), it would be easier.

Yes - and we even had some consensus that this was a good idea, at one
point.  Maybe not complete, precise consensus on every detail, but
certainly enough to have a conversation about it.

 But short of telling people who prefer $EDITOR to get lost, that's not
 going to work.

This is where I get off the train.  Preferring $EDITOR is not the same
thing as feeling that we need 500 lines of comments in the default
file.  There may be some people who hold both opinions, of course.

 I think the point of the discussions here is that we want to establish
 an alternate config file (or set of config files) in which the
 expectations are different: no promise to preserve any comments, no
 intention to be human-friendly for editing, etc.  In one sense this
 would be the same machine-readable format, since the backend is still
 going to parse it with guc-file.l; but changing the human expectations
 can make the editing problem much simpler.

I don't think this idea is without merit, but I don't think it's a
silver bullet, either.  If you can change the human expectations for
some file that gets processed along with postgresql.conf, you can
change the expectations for postgresql.conf itself.  In fact, you'll
have to: adding more files is BY DEFINITION going to change the
interpretation of postgresql.conf.  It will either be the magic bullet
file that overrides the other file, or visca versa - rather than, as
is the case in a default install today, being THE configuration file.

One of the issues that we need to face is: how many new files?  There
seems to be some sentiment that we can just drop in new files and
expect things to work.  I think that's likely to lead to chaos.
Having TWO files - one for $EDITOR and one for tools - may be
manageable.  There will be some user confusion as to which one is in
charge, but there are only two choices, so maybe it's not too bad.
But having a whole directory full of files emitted by different tools
sounds like a disaster, and therefore it seems to me that there is no
getting around the need to have a tool which can merge new settings
into an existing configuration file.

It would be completely logical to break up the configuration file into
subfiles by TOPIC.  That would complicate things for tool-writers
because they would need to get each setting into the proper file, and
we currently don't have any infrastructure for that.  But that's not
why people want this feature anyway.  What they want is to be able to
deposit new settings and have them take effect without parsing a
config file.  But they can do that today.  Just open postgresql.conf
for append, write a newline in case the file didn't already end with
one, write your settings, and close the file.  Your settings win
because they are last.  The problem is - now you've left a mess for
someone else to clean up.  Overtime duplicates will accumulate.  The
order of settings won't be preserved.  Comments won't be properly
updated.  But writing to another file doesn't actually fix any of
that.  Merging settings (either in postgresql.conf or in a separate
persistent.conf) does, at least in part.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Josh Berkus


 Let's see...if I remember correctly from the last two rounds of this
 discussion, this is the point where someone pops up and says that
 switching to XML for the postgresql.conf will solve this problem.
 Whoever does that this time goes into the ring with Kevin and I, but
 they don't get a club.  (All fight proceeds to benefit SPI of course).

XML is s last-week.  JSON!

Oh, wait, we're PostgreSQL, we're not that mainstream.  YAML!

--Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] inefficient use of relation extension?

2009-10-28 Thread Alvaro Herrera

Tom Lane wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:

  I regularly (several times a day) see five or six processes all with
  pg_locks locktype=extend granted=f on the same table, waiting for a long
  time.
 
 I'm not sure what's causing that, but I *seriously* doubt that adding
 new pages to FSM right away would make it better.  What it sounds like
 is someone is getting hung up while holding the lock.  You should try to
 investigate who's got the lock when this happens, and what they're doing
 or waiting for.

After some more research, these facts have arisen:

- the relation extension lock in question is on a toast table
- the entries stored in that table are long enough that they need more
than one page
- the BufMappingLock is seen as severely contended among the various
processes trying to extend the table

So the dozen+ processes fight the BufMapping lwlock while attempting to
extend the toast table, to get a free buffer for the new page; do this
one page at a time, and then go back to do the same thing over and over.

Shared_buffers is large (4.6 GB) and I'm not clear if this is just
inefficiency in the 8.1 bufmgr code, or that bgwriter is not aggresive
enough about clearing pages from the LRU end of the buffer pool.

I'm not really sure what the right way to attack this problem is, but
getting off 8.1 has now gotten a priority.

sinval was tracked too and we found out that it's not an issue at all.

Thanks everyone.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] alpha2 bundled -- please verify

2009-10-28 Thread Devrim GÜNDÜZ

On Wed, 2009-10-21 at 23:26 +0300, Peter Eisentraut wrote:
 Alpha2 has been bundled and is available at
 
 http://developer.postgresql.org/~petere/alpha/
 
 Please check that it is sane.

Peter, could you please provide md5sum for alpha2 tarball? I know we
missed in alpha1. It would be better if the original packager would
upload the md5sum.

Also, I'll be happy if you can also upload .bz2 file (for the lazy RPM
packages who does not want to play with his spec files a lot).

Regards,
-- 
Devrim GÜNDÜZ, RHCE
Command Prompt - http://www.CommandPrompt.com 
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Robert Haas

On Wed, Oct 28, 2009 at 4:24 PM, Josh Berkus j...@agliodbs.com wrote:

 Let's see...if I remember correctly from the last two rounds of this
 discussion, this is the point where someone pops up and says that
 switching to XML for the postgresql.conf will solve this problem.
 Whoever does that this time goes into the ring with Kevin and I, but
 they don't get a club.  (All fight proceeds to benefit SPI of course).

 XML is s last-week.  JSON!

 Oh, wait, we're PostgreSQL, we're not that mainstream.  YAML!

Definitely ASN.1

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Greg Smith


On Wed, 28 Oct 2009, Robert Haas wrote:


It would be completely logical to break up the configuration file into
subfiles by TOPIC.  That would complicate things for tool-writers
because they would need to get each setting into the proper file, and
we currently don't have any infrastructure for that.


Already done:

# select name,category from pg_settings limit 1;
   name   | category
--+---
 add_missing_from | Version and Platform Compatibility / Previous 
PostgreSQL Versions


You could make one per category, and pgtune for example already knows all 
this info.  The somewhat arbitrary category assignments Josh put things 
into are what Peter was complaining about upthread.  Questions like is 
'effective_cache_size' a memory parameters or an optimizer one? show why 
this is not trivial to do well.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Robert Haas

On Wed, Oct 28, 2009 at 4:52 PM, Greg Smith gsm...@gregsmith.com wrote:
 On Wed, 28 Oct 2009, Robert Haas wrote:

 It would be completely logical to break up the configuration file into
 subfiles by TOPIC.  That would complicate things for tool-writers
 because they would need to get each setting into the proper file, and
 we currently don't have any infrastructure for that.

 Already done:

 # select name,category from pg_settings limit 1;
       name       |                             category
 --+---
  add_missing_from | Version and Platform Compatibility / Previous PostgreSQL
 Versions

 You could make one per category, and pgtune for example already knows all
 this info.  The somewhat arbitrary category assignments Josh put things into
 are what Peter was complaining about upthread.  Questions like is
 'effective_cache_size' a memory parameters or an optimizer one? show why
 this is not trivial to do well.

I stand corrected.  I think the basic thrust of the paragraph stands -
this is not why people are asking for the feature, or if it is that
hasn't been articulated or discussed.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-28 Thread Hannu Krosing

On Wed, 2009-10-28 at 12:51 -0700, Jeff Davis wrote:
 On Wed, 2009-10-28 at 21:09 +0200, Hannu Krosing wrote:
  Is at least the fact that they are undocumented, have changed in the
  past, and are likely to change again in the future documented ?
 
 That's a little confusing to me: how do we document that something is
 undocumented? And where do we stop?

My previous e-mail message documents that the undocumentedness is
undocumented, so no need to go any further here ;)

Though undocumented, the hash functions are easily discoverable by doing

\df *hash*

in psql

  Hashing is a quite fundamental thing in computing, so I was quite
  surprised to find out it had silently changed. 
 
 There are many reasons to use a hash, and we don't want people to use
 these functions for the wrong purpose. 

I don't think that not documenting a hash function helps here at all.

 I have seen people use a
 performance hash for security purposes before, and I had to demonstrate
 some hash collisions to show why that was a bad idea. 

I've seen people use CRC32 as hash and then hit a collisions in 15 tries
with quite large keys.

 So, if we do provide documented functions, it should be done carefully.

Any user-visible behavior change should be done carefully, even if the
original behavior is not documented.

Careful upgrade of hasxxx functions would have kept the old functions,
and introduced the new ones with _v2 suffix, and then used these in
appropriate places. then kept the old ones for a few versions, with
maybe a deprecation warning and then moved them to contrib for a few
more versions.

Doing it this way could leave them undocumented and still not break
peoples applications in mysterious ways.

 Trying to develop and document a set of standardized, stable hash
 functions covering a wide range of possible use cases sounds like it may
 be better served by an extension.

I guess there are enough security/crypt/strong hashes in pgcrypto
package so that should not be a problem.

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Andrew Dunstan




Greg Smith wrote:

On Wed, 28 Oct 2009, Josh Berkus wrote:


It's the basic and unsolvable issue of how do you have a file which is
both perfectly human-readable-and-editable *and* perfectly
machine-readable-and-editable at the same time.


Let's see...if I remember correctly from the last two rounds of this 
discussion, this is the point where someone pops up and says that 
switching to XML for the postgresql.conf will solve this problem. 
Whoever does that this time goes into the ring with Kevin and I, but 
they don't get a club.  (All fight proceeds to benefit SPI of course).





That's precisely why I didn't get into this discussion (you guys are scary).

It really does seem like deja vu all over again. I'm usually good for a 
given argument once.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-28 Thread Hannu Krosing

On Wed, 2009-10-28 at 15:31 -0400, Tom Lane wrote:
 Hannu Krosing ha...@2ndquadrant.com writes:
  I had never checked the docs for hash functions, but I had assumed, that
  internal functions are prefixed by pg_ and anything else is public, free
  to use functionality.
 
 Sure, it's free to use.  It's not free to assume that we promise never
 to change it.

  Changing hash functions also makes in-place upgrades a lot harder, as
  they can't be done incrementally anymore for tables which use hash
  indexes.
 
 Hash indexes are so far from being production-grade that this argument
 is not significant.

AFAIK in-place upgrade is also not quite production-grade, so this was
meant as a forward-looking note for next time the hashxxx functions will
change.

   regards, tom lane
-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing config files in a directory

2009-10-28 Thread Simon Riggs

On Tue, 2009-10-27 at 00:38 -0400, Greg Smith wrote:
 new feature

One additional point that would be useful is a way to match up the usage
of custom_variable_classes with this new style of .conf file processing.

At the moment if you wish to add a custom variable class everybody needs
to edit the *same* parameter. Finding which one to edit could be a
little difficult with a whole directory to search in.

I propose a new form of processing for that variable: each new parameter
instance is added to last one, rather than replacing it.
e.g.
custom_variable_class = 'x'
custom_variable_class = 'y'
custom_variable_class = 'z'
is equivalent to
custom_variable_classes = 'x,y,z'

That allows NewFeatureX to drop in a file called newfeaturex.conf,
which looks like this

custom_variable_class = 'newfeaturex'
newfeaturex.param1 = x
newfeaturex.param2 = y
newfeaturex.param3 = z

This requires no editing of any other files, just a straight drop in.
That will make it much easier to produce real installers/deinstallers
for add-in modules.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Show schema size with \dn+

2009-10-28 Thread Guillaume Lelarge

Le mercredi 28 octobre 2009 à 15:11:31, Anders Steinlein a écrit :
 Is there any interest in expanding \dn+ to show schema size, similar
 to table sizes using \dt+ in 8.4? We use separate schemas for each
 user, so this would allow us to quickly look up the sizes of each
 user's data.
 
 I have little experience with C and none with the PostgreSQL code base
 -- where should I look to have a go at this?
 

I would say source file src/bin/psql/describe.c, function listSchemas.


-- 
Guillaume.
 http://www.postgresqlfr.org
 http://dalibo.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WIP: push AFTER-trigger execution into ModifyTable node

2009-10-28 Thread Tom Lane

In http://archives.postgresql.org/message-id/26545.1255140...@sss.pgh.pa.us
I suggested that we should push the actual execution (not just queuing)
of non-deferred AFTER triggers into the new ModifyTable plan node.
The attached patch does that, and seems like a nice improvement since it
removes knowledge of trigger handling from a number of other places.
However the original objective was to allow EXPLAIN to associate trigger
runtimes with ModifyTable nodes, and I realized that this patch doesn't
accomplish that --- the trigger stats are still accumulated in the
executor-wide EState, not in the ModifyTable node.  Right at the moment
we could cheat and have EXPLAIN print the trigger stats under
ModifyTable anyway, because there can be only one ModifyTable in any
plan tree.  But that will fall down as soon as we try to let INSERT
RETURNING and friends execute within WITH clauses.

After poking around a bit I think it should be possible to keep the
trigger instrumentation data in the ModifyTable node instead of in
EState, and thereby allow EXPLAIN to know which node to blame the
trigger time on.  This will require passing an extra parameter down
from nodeModifyTable into the trigger code, but none of those call paths
are very long.  Still, it'll be a significantly more invasive patch by
the time it's done than what you see here.

So, before I go off and do that work: anybody have an objection to this
line of development?  The main implication of changing to this approach
is that we'll be nailing down the assumption that each WITH (command
RETURNING) clause acts very much like a separate statement for
trigger purposes: it will fire BEFORE STATEMENT triggers at start,
and AFTER STATEMENT triggers at end, and actually execute all
non-deferred AFTER triggers, before we move on to executing the next
WITH clause or the main query.

regards, tom lane

Index: src/backend/commands/explain.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/explain.c,v
retrieving revision 1.192
diff -c -r1.192 explain.c
*** src/backend/commands/explain.c	12 Oct 2009 18:10:41 -	1.192
--- src/backend/commands/explain.c	28 Oct 2009 22:24:42 -
***
*** 19,25 
  #include commands/defrem.h
  #include commands/explain.h
  #include commands/prepare.h
- #include commands/trigger.h
  #include executor/instrument.h
  #include optimizer/clauses.h
  #include optimizer/planner.h
--- 19,24 
***
*** 354,363 
  
  	INSTR_TIME_SET_CURRENT(starttime);
  
- 	/* If analyzing, we need to cope with queued triggers */
- 	if (es-analyze)
- 		AfterTriggerBeginQuery();
- 
  	/* Select execution options */
  	if (es-analyze)
  		eflags = 0;/* default run-to-completion flags */
--- 353,358 
***
*** 383,402 
  	ExplainPrintPlan(es, queryDesc);
  
  	/*
! 	 * If we ran the command, run any AFTER triggers it queued.  (Note this
! 	 * will not include DEFERRED triggers; since those don't run until end of
! 	 * transaction, we can't measure them.)  Include into total runtime.
  	 */
  	if (es-analyze)
  	{
- 		INSTR_TIME_SET_CURRENT(starttime);
- 		AfterTriggerEndQuery(queryDesc-estate);
- 		totaltime += elapsed_time(starttime);
- 	}
- 
- 	/* Print info about runtime of triggers */
- 	if (es-analyze)
- 	{
  		ResultRelInfo *rInfo;
  		bool		show_relname;
  		int			numrels = queryDesc-estate-es_num_result_relations;
--- 378,389 
  	ExplainPrintPlan(es, queryDesc);
  
  	/*
! 	 * Print info about runtime of triggers.  (Note this will not include
! 	 * DEFERRED triggers; since those don't run until end of transaction, we
! 	 * can't measure them.)
  	 */
  	if (es-analyze)
  	{
  		ResultRelInfo *rInfo;
  		bool		show_relname;
  		int			numrels = queryDesc-estate-es_num_result_relations;
Index: src/backend/commands/portalcmds.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/portalcmds.c,v
retrieving revision 1.81
diff -c -r1.81 portalcmds.c
*** src/backend/commands/portalcmds.c	7 Oct 2009 16:27:18 -	1.81
--- src/backend/commands/portalcmds.c	28 Oct 2009 22:24:42 -
***
*** 264,270 
  			PG_TRY();
  			{
  CurrentResourceOwner = portal-resowner;
- /* we do not need AfterTriggerEndQuery() here */
  ExecutorEnd(queryDesc);
  FreeQueryDesc(queryDesc);
  			}
--- 264,269 
***
*** 371,377 
  		 * Now shut down the inner executor.
  		 */
  		portal-queryDesc = NULL;		/* prevent double shutdown */
- 		/* we do not need AfterTriggerEndQuery() here */
  		ExecutorEnd(queryDesc);
  		FreeQueryDesc(queryDesc);
  
--- 370,375 
Index: src/backend/commands/trigger.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/trigger.c,v
retrieving revision 1.256
diff -c -r1.256 trigger.c
*** src/backend/commands/trigger.c	27 Oct 2009 20:14:27 -

Re: [HACKERS] WIP: push AFTER-trigger execution into ModifyTable node

2009-10-28 Thread Marko Tiikkaja


Tom Lane wrote:

So, before I go off and do that work: anybody have an objection to this
line of development?  The main implication of changing to this approach
is that we'll be nailing down the assumption that each WITH (command
RETURNING) clause acts very much like a separate statement for
trigger purposes: it will fire BEFORE STATEMENT triggers at start,
and AFTER STATEMENT triggers at end, and actually execute all
non-deferred AFTER triggers, before we move on to executing the next
WITH clause or the main query.


Like we've discussed before, WITH (.. RETURNING ..) is probably most
useful for moving rows from one table to another.  When you're moving a
lot of rows around, there's some point where I believe this execution
strategy will be a lot slower than the traditional approach due to
storing the RETURNING results on disk.  I've been thinking that in some
cases we could inline the CTE for this to actually be a quite
significant performance benefit, so I'm not too fancy about the approach
you're suggesting.


Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: push AFTER-trigger execution into ModifyTable node

2009-10-28 Thread Tom Lane

Marko Tiikkaja marko.tiikk...@cs.helsinki.fi writes:
 Like we've discussed before, WITH (.. RETURNING ..) is probably most
 useful for moving rows from one table to another.  When you're moving a
 lot of rows around, there's some point where I believe this execution
 strategy will be a lot slower than the traditional approach due to
 storing the RETURNING results on disk.  I've been thinking that in some
 cases we could inline the CTE for this to actually be a quite
 significant performance benefit, so I'm not too fancy about the approach
 you're suggesting.

Well, this is what we need to nail down *now*.  Are we going to say that
use of WITH(RETURNING) means you forfeit all guarantees about order of
trigger firing?  Short of that, I don't believe that it is sane to think
about pipelining such things.  And if we do do that, it sounds like a
security hole to me, because the owner of the trigger isn't the one who
agreed to forfeit predictability.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: push AFTER-trigger execution into ModifyTable node

2009-10-28 Thread Robert Haas

On Wed, Oct 28, 2009 at 8:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Tiikkaja marko.tiikk...@cs.helsinki.fi writes:
 Like we've discussed before, WITH (.. RETURNING ..) is probably most
 useful for moving rows from one table to another.  When you're moving a
 lot of rows around, there's some point where I believe this execution
 strategy will be a lot slower than the traditional approach due to
 storing the RETURNING results on disk.  I've been thinking that in some
 cases we could inline the CTE for this to actually be a quite
 significant performance benefit, so I'm not too fancy about the approach
 you're suggesting.

 Well, this is what we need to nail down *now*.  Are we going to say that
 use of WITH(RETURNING) means you forfeit all guarantees about order of
 trigger firing?  Short of that, I don't believe that it is sane to think
 about pipelining such things.  And if we do do that, it sounds like a
 security hole to me, because the owner of the trigger isn't the one who
 agreed to forfeit predictability.

I don't see why either behavior would be a security hole; we get to
define how the system behaves, and users have to write their triggers
to cope with that behavior.  We don't want to throw random roadbocks
in the way of sanity, but users are not entitled to assume that no
future major release of PG will have semantics that are in any way
different from whichever release they're now running, especially for
features that don't even exist in the current release.  If you have a
specific concern here, maybe you could provide an example.

To be honest, I'm not entirely comfortable with either behavior.
Pipelining a delete out of one table into an insert into another table
seems VERY useful to me, and I'd like us to have a way to do that.  On
the other hand, in more complex cases, the fact that the effects of a
statement are normally not visible to that statement could lead to
some fairly confusing behavior, especially when triggers are involved.
 So I don't really know what the right thing is.  What I really want
is to provide both behaviors, but I'm not sure there's any sensible
way to do that, and even if there were it's not clear to me that users
will know which one they want.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Syntax for partitioning

2009-10-28 Thread Itagaki Takahiro

I'd like to improve partitioning feature in 8.5.
Kedar-san's previous work is wonderful, but I cannot see any updated patch.
http://archives.postgresql.org/message-id/bd8134a40906080702s96c90a9q3bbb581b9bd0d...@mail.gmail.com

So, I'll take over the work if there are no ones to do it.
I'm thinking to add syntax support first. Table partitioning was
proposed many times, but it is still not applied into core.
The reason is it is too difficult to make perfect partitioning
feature at once. I think syntax support is a good start.

First, I will add syntax for CREATE TABLE, ALTER TABLE ADD/DROP PARTITION.
The syntax is borrowed from from Oracle and MySQL. Their characteristics
are using LESS THAN in range partitioning. The keyword PARTITION is
added to the full-reserved keyword list to support ADD/DROP PARTITION.

Those syntax is merely a syntax sugar for INHERITS with CHECK. Declarations
are translated into CHECK constraints. I have a plan to adjust pg_dump to
dump definitions of partitioning in the correct format, but the actual
implementation will be still based on constraint exclusion. In addition,
hash partitioning is not implemented; syntax is parsed but not implemented
error are raised for now.

Here is syntax I propose:

ALTER TABLE table_name ADD PARTITION name ...;
ALTER TABLE table_name DROP PARTITION [IF EXISTS] name [CASCADE | RESTRICT];

Range partitioning:
  CREATE TABLE table_name ( columns )
PARTITION BY RANGE ( a_expr )
(
  PARTITION name VALUES LESS THAN [(] const [)],
  PARTITION name VALUES LESS THAN [(] MAXVALUE [)] -- overflow partition
);

List partitioning:
  CREATE TABLE table_name ( columns )
PARTITION BY LIST ( a_expr )
(
  PARTITION name VALUES [IN] ( const [, ...] ),
  PARTITION name VALUES [IN] [(] DEFAULT [)]   -- overflow partition
);

Hash partitioning:
  CREATE TABLE table_name ( columns )
PARTITION BY HASH ( a_expr )
PARTITIONS num_partitions;

  CREATE TABLE table_name ( columns )
PARTITION BY HASH ( a_expr )
(
  PARTITION name,
  ...
);

Note:
  * Each partition can have optional WITH (...) and TABLESPACE clauses.
  * '(' and ')' are optional to support both Oracle and MySQL syntax.


Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Syntax for partitioning

2009-10-28 Thread Pavel Stehule

2009/10/29 Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp:
 I'd like to improve partitioning feature in 8.5.
 Kedar-san's previous work is wonderful, but I cannot see any updated patch.
 http://archives.postgresql.org/message-id/bd8134a40906080702s96c90a9q3bbb581b9bd0d...@mail.gmail.com

 So, I'll take over the work if there are no ones to do it.
 I'm thinking to add syntax support first. Table partitioning was
 proposed many times, but it is still not applied into core.
 The reason is it is too difficult to make perfect partitioning
 feature at once. I think syntax support is a good start.

 First, I will add syntax for CREATE TABLE, ALTER TABLE ADD/DROP PARTITION.
 The syntax is borrowed from from Oracle and MySQL. Their characteristics
 are using LESS THAN in range partitioning. The keyword PARTITION is
 added to the full-reserved keyword list to support ADD/DROP PARTITION.

 Those syntax is merely a syntax sugar for INHERITS with CHECK. Declarations
 are translated into CHECK constraints. I have a plan to adjust pg_dump to
 dump definitions of partitioning in the correct format, but the actual
 implementation will be still based on constraint exclusion. In addition,
 hash partitioning is not implemented; syntax is parsed but not implemented
 error are raised for now.

 Here is syntax I propose:
 
 ALTER TABLE table_name ADD PARTITION name ...;
 ALTER TABLE table_name DROP PARTITION [IF EXISTS] name [CASCADE | RESTRICT];

 Range partitioning:
  CREATE TABLE table_name ( columns )
    PARTITION BY RANGE ( a_expr )
    (
      PARTITION name VALUES LESS THAN [(] const [)],
      PARTITION name VALUES LESS THAN [(] MAXVALUE [)] -- overflow partition
    );

 List partitioning:
  CREATE TABLE table_name ( columns )
    PARTITION BY LIST ( a_expr )
    (
      PARTITION name VALUES [IN] ( const [, ...] ),
      PARTITION name VALUES [IN] [(] DEFAULT [)]       -- overflow partition
    );

 Hash partitioning:
  CREATE TABLE table_name ( columns )
    PARTITION BY HASH ( a_expr )
    PARTITIONS num_partitions;

  CREATE TABLE table_name ( columns )
    PARTITION BY HASH ( a_expr )
    (
      PARTITION name,
      ...
    );

 Note:
  * Each partition can have optional WITH (...) and TABLESPACE clauses.
  * '(' and ')' are optional to support both Oracle and MySQL syntax.
 

 Comments welcome.

+1

Pavel


 Regards,
 ---
 ITAGAKI Takahiro
 NTT Open Source Software Center


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

56 matches

Mail list logo