Re: [PATCHES] pg_dump additional options for performance

2008-07-27 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  I dislike, and doubt that I'd use, this approach.  At the end of the
  day, it ends up processing the same (very large amount of data) multiple
  times.
 
 Well, that's easily avoided: just replace the third step by restoring
 directly to the target database.
 
 pg_restore --schema-before-data whole.dump before.sql
 edit before.sql
 pg_restore --schema-after-data whole.dump after.sql
 edit after.sql
 psql -f before.sql target_db
 pg_restore --data-only -d target_db whole.dump
 psql -f after.sql target_db

This would depend on the dump being in the custom format, though I
suppose that ends up being true for any usage of these options.  I've
never really been a fan of the custom format, in large part because it
doesn't really buy you all that much and makes changing things more
difficult (by having to extract out what you want to change, and then
omit it from the restore).

I can see some advantage to having the entire dump contained in a single
file and still being able to pull out pieces based on before/after.
Should we get a binary format which is much faster, I could see myself
being more likely to use pg_restore.  Same for parallelization or, in my
fantasies, the ability to copy schema, tables, indexes, etc, in 'raw' PG
format between servers.  Worse than having to vi an insanely large file,
or split it up to be able to modify the pieces you want, is having to
rebuild indexes, especially GIST ones.  That's another topic though.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_dump additional options for performance

2008-07-27 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Right, but the parallelization is going to happen sometime, and it is
 going to happen in the context of pg_restore.  So I think it's pretty
 silly to argue that no one will ever want this feature to work in
 pg_restore.

I think you've about convinced me on this, and it annoys me. ;)  Worse
is that it sounds like this might cause the options to not make it in
for 8.4, which would be quite frustrating.

 To extend the example I just gave to Stephen, I think a fairly probable
 scenario is where you only need to tweak some before object
 definitions, and then you could do
 
 pg_restore --schema-before-data whole.dump before.sql
 edit before.sql
 psql -f before.sql target_db
 pg_restore --data-only --schema-after-data -d target_db whole.dump
 
 which (given a parallelizing pg_restore) would do all the time-consuming
 steps in a fully parallelized fashion.

Alright, this has been mulling around in the back of my head a bit and
has now finally surfaced- I like having the whole dump contained in a
single file, but I hate having what ends up being out-dated or wrong
or not what was loaded in the dump file.  Doesn't seem likely to be
possible, but it'd be neat to be able to modify objects in the dump
file.

Also, something which often happens to me is that I need to change the
search_path or the role at the top of a .sql from pg_dump before
restoring it.  Seems like using the custom format would make that
difficult without some pipe/cat/sed magic.  Parallelization would make
using that kind of magic more difficult too, I would guess.  Might be
something to think about.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_dump additional options for performance

2008-07-27 Thread Stephen Frost
* Joshua D. Drake ([EMAIL PROTECTED]) wrote:
 Custom format rocks for partial set restores from a whole dump. See the  
 TOC option :)

I imagine it does, but that's very rarely what I need.  Most of the time
we're dumping out a schema to load it into a seperate schema (usually on
another host).  Sometimes that can be done by simply vi'ing the file to
change the search_path and whatnot, though more often we end up pipe'ing
the whole thing through sed.  Since we don't allow regular users to do
much, and you have to 'set role postgres;' to do anything as superuser,
we also often end up adding 'set role postgres;' to the top of the .sql
files.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_dump additional options for performance

2008-07-26 Thread Stephen Frost
* Simon Riggs ([EMAIL PROTECTED]) wrote:
 The key capability here is being able to split the dump into multiple
 pieces. The equivalent capability on restore is *not* required, because
 once the dump has been split the restore never needs to be. It might
 seem that the patch should be symmetrical with respect to pg_dump and
 pg_restore, but I see no use case for the pg_restore case.

I'm inclined to agree with this.  It might have been nice to provide a
way to split out already-created dumps, but I suspect that people who
need that probably have already figured out a way to do it (I know I
have..).  We should probably ensure that pg_restore doesn't *break* when
fed a partial dump.

  The patch as submitted enforces what seem largely arbitrary restrictions
  on combining these switches.
 
 I had it both ways at various points in development. I'm happy with what
 you propose.

I agree with removing the restrictions.  I don't see much in the way of
use cases, but it reduces code and doesn't cause problems.

  Another issue is that the rules for deciding which objects are before
  data and which are after data are wrong.  In particular ACLs are after
  data not before data, which is relatively easy to fix.  
 
 OK

This was partially why I was complaining about having documentation, and
a policy for that matter, which goes into more detail about why X is before
or after the data.  I agree that they're after today, but I don't see
any particular reason why they should be one or the other.  If we
adopted a policy like I proposed (--schema-post-data is essentially that
which uses the data and is faster done in bulk) then ACLs would be
before, and I tend to feel like it makes more sense that way.

  Not so easy to fix
  is that COMMENTs might be either before or after data depending on what
  kind of object they are attached to.
 
 Is there anything to fix? Comments are added by calls to dumpComment,
 which are always made in conjunction with the dump of an object. So if
 you dump the object you dump the comment. As long as objects are
 correctly split out then comments will be also.

I agree with this, and it follows for BLOB comments- in any case, they
go with the object being dumped at the time of that object getting
dumped.  Comments make sense as an extention of the object, not as a
seperate set of objects to be explicitly placed before or after the
data.

 All of the above makes me certain I want to remove these options from
 pg_restore.

I'm in agreement with this.

  BTW, another incomplete item is that pg_dumpall should probably be taught
  to accept and pass down --schema-before-data and --schema-after-data
  switches.
 
 OK

I could go either way on this.

 Can we prune down to the base use case to avoid this overhead? i.e. have
 these options on pg_dump only?

Makes sense to me.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_dump additional options for performance

2008-07-26 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote:
  The key problem is that pg_restore is broken:
 
  The key capability here is being able to split the dump into multiple
  pieces. The equivalent capability on restore is *not* required, because
  once the dump has been split the restore never needs to be.
 
 This argument is nonsense.  The typical usage of this capability, IMHO,
 will be
 
   pg_dump -Fc whole.dump
   pg_restore --schema-before-data whole.dump before.sql
   pg_restore --data-only whole.dump data.sql
   pg_restore --schema-after-data whole.dump after.sql
 
 followed by editing the schema pieces and then loading.

I dislike, and doubt that I'd use, this approach.  At the end of the
day, it ends up processing the same (very large amount of data) multiple
times.  We have 60G dump files sometimes, and there's no way I'm going
to dump that into a single file first if I can avoid it.  What we end up
doing today is --schema-only followed by vi'ing it and splitting it up
by hand, etc, then doing a seperate --data-only dump.

 One reason
 is that this gives you a consistent dump, whereas three successive
 pg_dump runs could never guarantee any such thing.  

While this is technically true, in most cases people have control over
the schema bits and would likely be able to ensure that the schema
doesn't change during the time.  At that point it's only the data, which
is still done in a transactional way.

 Another reason is that you may well not know when you prepare the
 dump that you will need split output, because the requirement to edit
 the dump is likely to be realized only when you go to load it.

This is a good point.  My gut reaction is that, at least in my usage, it
would be more about if it's larger than a gig, I might as well split it
out, just in case I need to touch something.  Honestly, it's rare that
I don't have to make *some* change.  Often that's the point of dumping
it out.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_dump additional options for performance

2008-07-23 Thread Stephen Frost
Simon,

* Simon Riggs ([EMAIL PROTECTED]) wrote:
 ...and with command line help also.

The documentation and whatnot looks good to me now.  There are a couple
of other issues I found while looking through and testing the patch
though-

Index: src/bin/pg_dump/pg_dump.c
===
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.497
diff -c -r1.497 pg_dump.c
*** src/bin/pg_dump/pg_dump.c   20 Jul 2008 18:43:30 -  1.497
--- src/bin/pg_dump/pg_dump.c   23 Jul 2008 17:04:24 -
***
*** 225,232 
RestoreOptions *ropt;
  
static int  disable_triggers = 0;
!   static int  outputNoTablespaces = 0;
static int  use_setsessauth = 0;
  
static struct option long_options[] = {
{data-only, no_argument, NULL, 'a'},
--- 229,238 
RestoreOptions *ropt;
  
static int  disable_triggers = 0;
!   static int outputNoTablespaces = 0;
static int  use_setsessauth = 0;
+   static int  use_schemaBeforeData;
+   static int  use_schemaAfterData;
  
static struct option long_options[] = {
{data-only, no_argument, NULL, 'a'},
***

This hunk appears to have a bit of gratuitous whitespace change, not a
big deal tho.

***
*** 464,474 
[...]
+   if (dataOnly)
+   dumpObjFlags = REQ_DATA;
+ 
+   if (use_schemaBeforeData == 1)
+   dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
+ 
+   if (use_schemaAfterData == 1)
+   dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
+ 
+   if (schemaOnly)
+   dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
***

It wouldn't kill to be consistant between testing for '== 1' and just
checking for non-zero.  Again, not really a big deal, and I wouldn't
mention these if there weren't other issues.

***
*** 646,652 
 * Dumping blobs is now default unless we saw an inclusion switch or -s
 * ... but even if we did see one of these, -b turns it back on.
 */
!   if (include_everything  !schemaOnly)
outputBlobs = true;
  
/*
--- 689,695 
 * Dumping blobs is now default unless we saw an inclusion switch or -s
 * ... but even if we did see one of these, -b turns it back on.
 */
!   if (include_everything  WANT_PRE_SCHEMA(dumpObjFlags))
outputBlobs = true;
  
/*
***

Shouldn't this change be to WANT_DATA(dumpObjFlags)?  That's what most
of the '!schemaOnly' get translated to.  Otherwise I think you would be
getting blobs when you've asked for just schema-before-data, which
doesn't seem like it'd make much sense.

***
*** 712,718 
dumpStdStrings(g_fout);
  
/* The database item is always next, unless we don't want it at all */
!   if (include_everything  !dataOnly)
dumpDatabase(g_fout);
  
/* Now the rearrangeable objects. */
--- 755,761 
dumpStdStrings(g_fout);
  
/* The database item is always next, unless we don't want it at all */
!   if (include_everything  WANT_DATA(dumpObjFlags))
dumpDatabase(g_fout);
  
/* Now the rearrangeable objects. */
***

Shouldn't this be 'WANT_PRE_SCHEMA(dumpObjFlags)'?

***
*** 3414,3420 
continue;
  
/* Ignore indexes of tables not to be dumped */
!   if (!tbinfo-dobj.dump)
continue;
  
if (g_verbose)
--- 3459,3465 
continue;
  
/* Ignore indexes of tables not to be dumped */
!   if (!tbinfo-dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
continue;
  
if (g_verbose)
***

I didn't test this, but it strikes me as an unnecessary addition?  If
anything, wouldn't this check make more sense being done right after
dropping into getIndexes()?  No sense going through the loop just for
fun..  Technically, it's a behavioral change for --data-only since it
used to gather index information anyway, but it's a good optimization if
done in the right place.

Also around here, there doesn't appear to be any checking in
dumpEnumType(), which strikes me as odd.  Wouldn't that deserve a

if (!WANT_PRE_SCHEMA(dumpObjFlags))
return;

check?  If not even some kind of equivilant -dobj.dump check..

***
*** 9803,9809 
tbinfo-dobj.catId, 0, 
tbinfo-dobj.dumpId);
}
  
!   if (!schemaOnly)
{
resetPQExpBuffer(query);
appendPQExpBuffer(query, SELECT pg_catalog.setval();
--- 9848,9854 
tbinfo-dobj.catId, 0, 
tbinfo-dobj.dumpId);
}
  
!   

Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Stephen Frost
* Simon Riggs ([EMAIL PROTECTED]) wrote:
 The options split the dump into 3 parts that's all: before the load, the
 load and after the load.
 
 --schema-pre-load says
 Dumps exactly what option--schema-only/ would dump, but only those
 statements before the data load.
 
 What is it you are suggesting? I'm unclear.

That part is fine, the problem is that elsewhere in the documentation
(patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you
change it to be objects required before data loading, which isn't the
same.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Stephen Frost
Simon,

* Simon Riggs ([EMAIL PROTECTED]) wrote:
  I hadn't realized that Simon was using pre-schema and post-schema
  to name the first and third parts.  I'd agree that this is confusing
  nomenclature: it looks like it's trying to say that the data is the
  schema, and the schema is not!  How about pre-data and post-data?
 
 OK by me. Any other takers?

Having the command-line options be --schema-pre-data and
--schema-post-data is fine with me.  Leaving them the way they are is
also fine by me.  It's the documentation (back to pg_dump.sgml,
~774/~797) that starts talking about Pre-Schema and Post-Schema.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Stephen Frost
Tom, et al,

* Tom Lane ([EMAIL PROTECTED]) wrote:
 Ah, I see.  No objection to those switch names, at least assuming we
 want to stick to positive-logic switches.  What did you think of the
 negative-logic suggestion (--omit-xxx)?

My preference is for positive-logic switches in general.  The place
where I would use this patch would lend itself to being more options if
--omit- were used.  I expect that would hold true for most people.
It would be:

  --omit-data --omit-post-load
  --omit-pre-load --omit-post-load
  --omit-pre-load --omit-data

vs.

  --schema-pre-load
  --data-only
  --schema-post-load

Point being that I'd be dumping these into seperate files where I could
more easily manipulate the pre-load or post-load files.  I'd still want
pre/post load to be seperate though since this would be used in cases
where there's alot of data (hence the reason for the split) and putting
pre and post together and running them before data would slow things
down quite a bit.

Are there use cases for just --omit-post-load or --omit-pre-load?
Probably, but I just don't see any situation where I'd use them like
that.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_dump additional options for performance

2008-07-20 Thread Stephen Frost
Simon,

* Simon Riggs ([EMAIL PROTECTED]) wrote:
 On Sun, 2008-07-20 at 05:47 +0100, Simon Riggs wrote:
  On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote:
[...]
 - Conflicting option handling

Thanks for putting in the extra code to explicitly indicate which
conflicting options were used.

 - Documentation
 When writing the documentation I would stress that pre-schema and
 post-schema be defined in terms of PostgreSQL objects and why they
 are pre vs. post.

Perhaps this is up for some debate, but I find the documentation added
for these options to be lacking the definitions I was looking for, and
the explanation of why they are what they are.  I'm also not sure I
agree with the Pre-Schema and Post-Schema nomenclature as it doesn't
really fit with the option names or what they do.  Would you consider:

termoption--schema-pre-load/option/term
listitem
 para
   Pre-Data Load - Minimum amount of the schema required before data
   loading can begin.  This consists mainly of creating the tables
   using the commandCREATE TABLE/command.
   This part can be requested using option--schema-pre-load/.
 /para
/listitem

termoption--schema-post-load/option/term
listitem
 para
   Post-Data Load - The rest of the schema definition, including keys,
   indexes, etc.  By putting keys and indexes after the data has been
   loaded the whole process of restoring data is much faster.  This is
   because it is faster to build indexes and check keys in bulk than
   piecemeal as the data is loaded.
   This part can be requested using option--schema-post-load/.
 /para
/listitem

Even this doesn't cover everything though- it's too focused on tables
and data loading.  Where do functions go?  What about types?

A couple of additional points:

  - The command-line help hasn't been updated.  Clearly, that also needs
to be done to consider the documentation aspect complete.

  - There appears to be a bit of mistakenly included additions.  The
patch to pg_restore.sgml attempts to add in documentation for
--superuser.  I'm guessing that was unintentional, and looks like
just a mistaken extra copypaste.

 - Technically, the patch needs to be updated slightly since another
 pg_dump-related patch was committed recently which also added
 options and thus causes a conflict.

I think this might have just happened again, funny enough.  It's
something that a committer could perhaps fix, but if you're reworking
the patch anyway...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_dump additional options for performance

2008-07-20 Thread Stephen Frost
* Simon Riggs ([EMAIL PROTECTED]) wrote:
 On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote:
  Even this doesn't cover everything though- it's too focused on tables
  and data loading.  Where do functions go?  What about types?
 
 Yes, it is focused on tables and data loading. What about
 functions/types? No relevance here.

I don't see how they're not relevant, it's not like they're being
excluded and in fact they show up in the pre-load output.  Heck, even if
they *were* excluded, that should be made clear in the documentation
(either be an explicit include list, or saying they're excluded).

Part of what's driving this is making sure we have a plan for future
objects and where they'll go.  Perhaps it would be enough to just say
pre-load is everything in the schema, except things which are faster
done in bulk (eg: indexes, keys).  I don't think it's right to say
pre-load is only object definitions required to load data when it
includes functions and ACLs though.

Hopefully my suggestion and these comments will get us to a happy
middle-ground.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_dump additional options for performance

2008-07-20 Thread Stephen Frost
* daveg ([EMAIL PROTECTED]) wrote:
 One observation, indexes should be built right after the table data
 is loaded for each table, this way, the index build gets a hot cache
 for the table data instead of having to re-read it later as we do now.

That's not how pg_dump has traditionally worked, and the point of this
patch is to add options to easily segregate the main pieces of the
existing pg_dump output (main schema definition, data dump, key/index
building).  You suggestion brings up an interesting point that should
pg_dump's traditional output structure change the --schema-post-load
set of objects wouldn't be as clear to newcomers since the load and the
indexes would be interleaved in the regular output.

I'd be curious about the performance impact this has on an actual load
too.  It would probably be more valuable on smaller loads where it would
have less of an impact anyway than on loads larger than the cache size.
Still, not an issue for this patch, imv.

Thanks,

Stephen


signature.asc
Description: Digital signature


[PATCHES] pg_dump additional options for performance

2008-07-19 Thread Stephen Frost
Simon,

  I agree with adding these options in general, since I find myself
  frustrated by having to vi huge dumps to change simple schema things.
  A couple of comments on the patch though:

  - Conflicting option handling
I think we are doing our users a disservice by putting it on them to
figure out exactly what:
multiple object groups cannot be used together
means to them.  You and I may understand what an object group is,
and why there can be only one, but it's a great deal less clear than
the prior message of
options -s/--schema-only and -a/--data-only cannot be used together
My suggestion would be to either list out the specific options which
can't be used together, as was done previously, or add a bit of (I
realize, boring) code and actually tell the user which of the
conflicting options were used.

  - Documentation
When writing the documentation I would stress that pre-schema and
post-schema be defined in terms of PostgreSQL objects and why they
are pre vs. post.

  - Technically, the patch needs to be updated slightly since another
pg_dump-related patch was committed recently which also added
options and thus causes a conflict.

  Beyond those minor points, the patch looks good to me.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] Feature: give pg_dump a WHERE clause expression

2008-06-01 Thread Stephen Frost
* daveg ([EMAIL PROTECTED]) wrote:
 The feature that the proposed patch enables is to create pg_dump custom
 format archives for multiple tables with a predicate. No amount of csv or
 xml will do that. Contrived example:

Uh, pg_dump's custom format really isn't particularly special, to be
honest.  Is there some reason you're interested in using it over, as was
suggested, COPY/CSV/etc format?  You could, of course, gzip the output
of COPY (as pg_dump does) if you're concerned about space..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] column level privileges

2008-05-07 Thread Stephen Frost
* Andrew Dunstan ([EMAIL PROTECTED]) wrote:
 Tom Lane wrote:
 I'm not sure where we go from here.  Your GSOC student has disappeared,
 right?  Is anyone else willing to take up the patch and work on it?

 No, he has not disappeared at all. He is going to work on fixing issues  
 and getting the work up to SQL99 level.

Great!

 Your review should help enormously.

 Stephen, perhaps you would like to work with him.

I'd be happy to.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] column level privileges

2008-05-06 Thread Stephen Frost
Tom, et al,

* Tom Lane ([EMAIL PROTECTED]) wrote:
 I'm not sure where we go from here.  Your GSOC student has disappeared,
 right?  Is anyone else willing to take up the patch and work on it?

I'm willing to take it up and work on it.  I'm very interested in having
column-level privileges in PG.  I also have some experiance in the
gram.y and ACL areas already that should make things go quickly.  If
anyone else is interested/has resources, please let me know.

 Admittedly the patch's syntax is more logical (especially if you
 consider the possibility of multiple tables) but I don't think we
 can go against the spec.  This problem invalidates the gram.y changes
 and a fair amount of what was done in aclchk.c.

Agreed, we need to use the SQL spec syntax.

 2. The checking of INSERT/UPDATE permissions has been moved to a
 completely unacceptable time/place, namely during parse analysis instead
 of at the beginning of execution.  This is unusable for prepared
 queries, for example, and it also fails to apply permission checking
 properly for UPDATEs of inheritance trees (only the parent would get
 checked).  This seems not very simple to fix :-(.  By the time the plan
 gets to the executor it is not clear which columns were actually
 specified as targets by the user and which were filled in as defaults by
 the rewriter or planner.  One possible solution is to add a flag field
 to TargetEntry to carry the information forward.

I'll look into this, I liked the bitmap idea, personally.

 Some other points that need to be considered:
 
  1. The execution of GRANT/REVOKE for column privileges. Now only 
  INSERT/UPDATE/REFERENCES privileges are supported, as SQL92 specified. 
  SELECT privilege is now not supported.
 
 Well, SQL99 has per-column SELECT privilege, so we're already behind
 the curve here.  The main problem again is to figure out a reasonable
 means for the executor to know which columns to check.  TargetEntry
 markings won't help here.  I thought about adding a bitmap to each
 RTE showing the attribute numbers explicitly referenced in the query,
 but I'm unsure if that's a good solution or not.
 
 I'd be willing to leave this as work to be done later, since 90% of
 the patch is just concerned with the mechanics of storing per-column
 privileges and doesn't care which ones they are exactly.  But it
 needs to be on the to-do list.

I think it would be quite unfortunate to not include per-column SELECT
privileges with the initial version.  It has significant uses and would
really be a pretty obvious hole in our implementation.

 What I think would be a more desirable solution is that the table ACL
 still sets the table-level INSERT or UPDATE privilege bit as long as
 you have any such privilege.  In the normal case where no per-column
 privilege has been granted, the per-column attacl fields all remain
 NULL and that's all you need.  As soon as any per-column GRANT or
 REVOKE is issued against a table, expand out the per-column ACLs to
 match the previous table-level state, and then apply the per-column
 changes.  I think you'd need an additional pg_class flag column,
 say relhascolacls, to denote whether this has been done --- then
 privilege checking would know it only needs to look at the column
 ACLs when this field is set.

I agree with this approach and feel it's alot cleaner as well as faster.
We definitely don't want to make permission checking take any more time
than it absolutely has to.

 With this scheme we don't need per-column entries in pg_shdepend,
 we can just reference the table-level bits as before.  REVOKE would have
 the responsibility of getting rid of per-column entries, if any, as a
 followup to revoking per-table entries during a DROP USER operation.

Doesn't sound too bad.

 Something else that needs to be thought about is whether system columns
 have privileges or not.  The patch seems to be assuming not in some
 places, but at least for SELECT it seems like this might be sensible.
 Also, you can already do COPY TO the OID column if any, so even without
 any future extensions it seems like we've got the issue in front of us.

I certainly feel we should be able to have per-column privileges on
system columns, though we should only use them were appropriate, of
course.

 One other mistake I noted was that the version checks added in pg_dump
 and psql are = 80300, which of course is obsolete now.

That one's pretty easy to handle. :)

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] Proposed patch to disallow password=foo in database name parameter

2007-12-10 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Anybody think this is good, bad, or silly?  Does the issue need
 explicit documentation, and if so where and how?

I'm going to have to vote 'silly' on this one.  While I agree that in
general we should discourage, and not provide explicit command-line
options for, passing a password on the command-line, I don't feel that
it makes sense to explicitly complicate things to prevent it.

Just my 2c,

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] allow CSV quote in NULL

2007-07-27 Thread Stephen Frost
* Gregory Stark ([EMAIL PROTECTED]) wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  Stephen Frost [EMAIL PROTECTED] writes:
Please find attached a minor patch to remove the constraints that a
user can't include the delimiter or quote characters in a 'NULL AS'
string when importing CSV files.
 
  This can't really be sane can it?
 
 Including unquoted delimiter characters seems kind of insane. But including
 the quote characters or quoted delimiter characters seems reasonable. 

Right.  We could write something to check if the user provides an
unquoted delimiter character but I'm not really sure it's worth it, to
be perfectly honest.  If it'll make people happier with the patch I can
do it though.

 The alternative would be interpreting NULL strings after dequoting but that
 would leave no way to include the NULL string literally. This solution means
 there's no way to include it (if it needs quoting) but only when you specify
 it this way.

Yeah, interpreting NULLs after dequoting means you've lost the
information about if it's quoted or not, or you have to add some funky
syntax to say if it's quoted, do it differently..., which is no good,
imv.

What the patch does basically is say give us the exact string that
shows up between the unquoted delimiters that you want to be treated
as a NULL.  This removes the complexity of the question about quoting,
unquoting, whatever, and makes it a very clear-cut, straight-forward
solution with no impact on existing users, imv.

Thanks,

Stephen


signature.asc
Description: Digital signature


[PATCHES] allow CSV quote in NULL

2007-07-26 Thread Stephen Frost
Greetings,

  Please find attached a minor patch to remove the constraints that a
  user can't include the delimiter or quote characters in a 'NULL AS'
  string when importing CSV files.

  This allows a user to explicitly request that NULL conversion happen
  on fields which are quoted.  As the quote character is being allowed
  to be in the 'NULL AS' string now, there's no reason to exclude the
  delimiter character from being seen in that string as well, though
  unless quoted using the CSV quote character it won't ever be matched.

  An example of the usage:

  sfrost*=# \copy billing_data from ~/BillingSamplePricerFile.csv
with csv header quote as '' null as ''

  This is no contrived example, it's an issue I ran into earlier today
  when I got a file which had (for reasons unknown to me and not easily
  changed upstream): 
  
  1,V,WASHDCABC12,,120033...

  Both of the ending columns shown are integer fields, the  here being
  used to indicate a NULL value.

  Without the patch, an ERROR occurs:

  sfrost= \copy billing_data from ~/BillingSamplePricerFile.csv 
with csv header quote as ''
  ERROR:  invalid input syntax for integer: 

  And there's no way to get it to import with COPY CSV mode.  The
  patch adds this ability without affecting existing usage or changing
  the syntax.  Even with the patch an ERROR occurs with the default
  treatment of CSV files:

  sfrost=# \copy billing_data from ~/BillingSamplePricerFile.csv
   with csv header quote as ''
  ERROR:  invalid input syntax for integer: 

  Which would be expected.  If the file is modified to remove the s
  for NULL columns, it imports just fine with the syntax above.

  It'd be really nice to have this included.

Thanks!

Stephen
Index: src/backend/commands/copy.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.285
diff -c -r1.285 copy.c
*** src/backend/commands/copy.c	20 Jun 2007 02:02:49 -	1.285
--- src/backend/commands/copy.c	27 Jul 2007 04:13:15 -
***
*** 926,944 
  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  			  errmsg(COPY force not null only available using COPY FROM)));
  
- 	/* Don't allow the delimiter to appear in the null string. */
- 	if (strchr(cstate-null_print, cstate-delim[0]) != NULL)
- 		ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- 		errmsg(COPY delimiter must not appear in the NULL specification)));
- 
- 	/* Don't allow the CSV quote char to appear in the null string. */
- 	if (cstate-csv_mode 
- 		strchr(cstate-null_print, cstate-quote[0]) != NULL)
- 		ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-  errmsg(CSV quote character must not appear in the NULL specification)));
- 
  	/* Disallow file COPY except to superusers. */
  	if (!pipe  !superuser())
  		ereport(ERROR,
--- 926,931 
***
*** 2888,2894 
  	{
  		bool		found_delim = false;
  		bool		in_quote = false;
- 		bool		saw_quote = false;
  		char	   *start_ptr;
  		char	   *end_ptr;
  		int			input_len;
--- 2875,2880 
***
*** 2921,2927 
  			/* start of quoted field (or part of field) */
  			if (c == quotec  !in_quote)
  			{
- saw_quote = true;
  in_quote = true;
  continue;
  			}
--- 2907,2912 
***
*** 2971,2977 
  
  		/* Check whether raw input matched null marker */
  		input_len = end_ptr - start_ptr;
! 		if (!saw_quote  input_len == cstate-null_print_len 
  			strncmp(start_ptr, cstate-null_print, input_len) == 0)
  			fieldvals[fieldno] = NULL;
  
--- 2956,2962 
  
  		/* Check whether raw input matched null marker */
  		input_len = end_ptr - start_ptr;
! 		if (input_len == cstate-null_print_len 
  			strncmp(start_ptr, cstate-null_print, input_len) == 0)
  			fieldvals[fieldno] = NULL;
  

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] SSPI authentication - patch

2007-07-20 Thread Stephen Frost
* Magnus Hagander ([EMAIL PROTECTED]) wrote:
 On Thu, Jul 19, 2007 at 06:22:57PM -0400, Stephen Frost wrote:
  My thinking would be to have the autoconf to disable it, but enable it
  by default.  I don't feel particularly strongly about it though.
 
 Do you see a use-case where someone would disable it? I'll be happy to add
 the switch if you do, it's not hard to do, but adding a switch just for the
 sake of adding a switch is not something I lik e:-)

Eh, I could contrive one but, as I said, I don't feel particularly
strongly about it.  How about we go w/o it for now and see if anyone
asks for it.

  I understand that SSPI is case-insensitive, or folds to uppercase, or
  whatever, but this is *not* used only by the SSPI code.  Please correct
  me if I'm wrong, but this will break existing krb-auth using client
  applications/setups that went with the previous default, no?  I realize
  it's on Windows, but there are people out there with that
  configuration (yes, like me... :)...
 
 Ok, first to clearify the facts:
 * SSPI is case-insensitive, case-preserving
 * The problem is not from SSPI. It's Active Directory. If you use AD as the
 KDC, you must use uppercase SPNs - regardless of SSPI. For example, it's
 needed for anybody wanting to use the old krb5 auth in 8.x together with
 Active Directory - like I do :-)

Ah, thanks for clearing up where the problem arises from.

 The change is there to because the majority of windows installs will
 be using Active Directory, at least that's what I would expect. Certainly
 not all, but most. It's a way of lowering the bar for the majority, at the
 expense of the minority ;-)

It's also at the expense of backwards compatibility. :/  People who are
currently using the krb5 auth mechanism with AD are used to having to
flip that or set the environment variable while people who have been
using it with an MIT KDC may get suprised by it.

 That said, I actually intended to submit that as a separate patch for
 separate discussion. If people are against it, I'll be happy to drop that
 part.

My main concern is that it's a backward-incompatible change.  I realize
that it's likely going in the direction of the majority on Windows but
it seems to make like it's not something we should just 'do'.  That
said, I don't see it as a problem for me since I've got a reasonably
small user-base (10s, not 100s or 1000s) of Windows users and setting
the environment variable shouldn't be an issue.

 Again, it's not related to the library used, it's related to the KDC. And
 we can't detect that, at least not early enough.

That's true, but if we used upper-case with something NEW (SSPI) while
keeping it the same for the OLD (KRB5, and I'd vote GSSAPI) then we're
not breaking backwards compatibility while also catering to the masses.
I guess I don't see too many people using SSPI w/ an MIT KDC, and it
wasn't possible previously anyway.

What do you think?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] SSPI authentication - patch

2007-07-19 Thread Stephen Frost
* Magnus Hagander ([EMAIL PROTECTED]) wrote:
 Here's an updated version of this patch. This version has full SSPI support
 in the server as well, so I can do both kerberos and NTLM between two
 windows machines using the negotiate method.

Great!  Also, I've tested that it works under Windows using
PGGSSLIB=gssapi with the MIT GSS libraries.  I did have to set the
PGKRBSRVNAME to 'postgres'.  It worked excellently. :)

 Since SSPI and GSSAPI can now both be used, my plan is not to have an
 autoconf to disable SSPI, but to just enable it unconditionally on win32.
 Or does this seem like a bad idea?

My thinking would be to have the autoconf to disable it, but enable it
by default.  I don't feel particularly strongly about it though.

 Comments welcome.

It looks good in general to me (though I'm not super-familiar with
SSPI).  My one big concern is this:

   /* Define to the name of the default PostgreSQL service principal in 
 Kerberos.
  (--with-krb-srvnam=NAME) */
 ! #define PG_KRB_SRVNAM postgres
   
   /* A string containing the version number, platform, and C compiler */
   #define PG_VERSION_STR Uninitialized version string (win32)
 --- 582,588 
   
   /* Define to the name of the default PostgreSQL service principal in 
 Kerberos.
  (--with-krb-srvnam=NAME) */
 ! #define PG_KRB_SRVNAM POSTGRES

I understand that SSPI is case-insensitive, or folds to uppercase, or
whatever, but this is *not* used only by the SSPI code.  Please correct
me if I'm wrong, but this will break existing krb-auth using client
applications/setups that went with the previous default, no?  I realize
it's on Windows, but there are people out there with that
configuration (yes, like me... :)...

I don't particularly like it but, honestly, it seems like it might be
better to set it based on what's being used (GSSAPI/SSPI/KRB5)?  This
would be for the client-side, as I guess we've decided it's okay to just
pick whatever keytab the users provide that's in our server-side
keytab.

Thanks again!

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] dblink connection security

2007-07-09 Thread Stephen Frost
* Joe Conway ([EMAIL PROTECTED]) wrote:
 Get serious. Internal functions are specifically designed and maintained to 
 be safe within the confines of the database security model. We are 
 discussing extensions to the core, all of which must be installed by 
 choice, by a superuser.

Extensions should also be designed and maintained to be safe within the
confines of the database security model.  Having to be installed by a
superuser doesn't change that.  I would consider it a serious risk which
would need to be fixed if, for example, a function in PostGIS was found
to allow priviledge escalation by a user.  Claiming it was installed by
choice, by a superuser doesn't change that.

It's about as good as saying Well, an admin had to install PostgreSQL
on the system, by choice, and therefore we don't need to worry about PG
allowing someone remote shell access to the system.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] dblink connection security

2007-07-09 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Joe Conway [EMAIL PROTECTED] writes:
  But if you know of a security risk related to using libpq 
  with a password authenticated connection, let's hear it.
 
 As near as I can tell, the argument is that dblink might be used to send
 connection-request packets to random addresses.  Now this is only a

Yes.

 security issue if the attacker could not have reached such an address
 directly; otherwise he might as well send the packet himself (and have a

No.  Being able to come from a different address is valuable even if you
can get to that address directly yourself.

 lot more control over its content).  So I guess the scenario is that
 you're running your database on your firewall machine, where it is
 accessible from outside your net but also can reach addresses inside.

It wouldn't need to be on your firewall, just behind it, which is
extremely common.

 And you're letting untrustworthy outside people log into the database.

It's not nearly so convoluted.  SQL injections happen.  

 And you put dblink on it for them to use.  And even then, the amount of
 damage they could do seems pretty limited due to lack of control over
 the packet contents.

dblink could have been installed for a variety of reasons.  Making it
openly available on install makes it much less likely any additional
restrictions were placed on it.

 To me this scenario is too far-fetched to justify sacrificing
 convenience and backwards compatibility.  It should be sufficient to add
 some paragraphs about security considerations to the dblink docs.

I feel that requiring a sysadmin to issue a 'grant' if they want
that convenience is justified and reasonable.  We could include the
statement itself in the documentation we're expecting them to read
anyway so they can just copy  paste it.  Adding paragraphs to the
documentation is good but doesn't justify a insecure-by-default
approach.

Regardless of what core ends up doing, I'm hopeful it'll be disabled by
default under Debian.  It'd certainly be easier if it was done upstream.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] dblink connection security

2007-07-08 Thread Stephen Frost
* Gregory Stark ([EMAIL PROTECTED]) wrote:
 Joe Conway [EMAIL PROTECTED] writes:
  If there are no objections I'll commit this later today.
 
 My objection is that I think we should still revoke access for non-superuser
 by default. The patch makes granting execute reasonable for most users but
 nonetheless it shouldn't be the default.
 
 Being able to connect to a postgres server shouldn't mean being able to open
 tcp connections *from* that server to arbitrary other host/ports. Consider for
 example that it would allow a user to perform a port scan from inside your
 network to see what internal services are running.

I'm in agreement with Greg.  It's a poor idea, overall, to allow users
to initiate TCP connections from the backend.  That should be a
superuser-only ability and should require security definer functions
with appropriate safe-guards (which would be site-specific) to be
created by the end admins.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] dblink connection security

2007-07-08 Thread Stephen Frost
* Gregory Stark ([EMAIL PROTECTED]) wrote:
 Actually from a security point of view revoking public execute is pretty much
 the same as making a function super-user-only. The only difference is how much
 of a hassle it is for the super-user to grant access. Perhaps we should
 reconsider whether any of the other super-user-only functions should be simply
 not executable by default but work normally if granted.

Revoking public execute on it by default would definitely make me
happier.  I could be swayed either way on the explicit super-user check
in the function itself.  In the general case, imv we should at least
attempt to consider the risk involved in improper handling of the
permissions around super-user-only functions.  Higher risk implies an
extra check in the code to force use of SECURITY DEFINER functions to
work around it, in an attempt to impart the severity of the risk.

Thinking about it a bit more, I'd honestly like to see the check there
for dblink().  That's not entirely fair of me though I suppose, I really
don't feel comfortable with dblink() to begin with and don't expect I'll
ever use it. :)

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] dblink connection security

2007-07-08 Thread Stephen Frost
* Joe Conway ([EMAIL PROTECTED]) wrote:
 If you are going to argue that we should revoke access for non-superusers 
 by default for dblink, then you are also arguing that we should do the same 
 for every function created with any untrusted language.

Uh, no, one doesn't imply the other.  It doesn't follow that because a
specific, known insecure, function shouldn't be available to all users
immediately that quite probably safe/secure functions (even though
they're written in an untrusted language- what has that got to do with
anything?) also shouldn't be.

 E.g. as I pointed out to Robert last week, just because an unsafe function 
 is created in plperlu, it doesn't mean that a non-superuser can't run it 
 immediately after it is created. There is no difference. It is incumbent 
 upon the DBA/superuser to be careful _whenever_ they create any function 
 using an untrusted language.

This isn't a case of the DBA/superuser writing the function.  It's being
provided by a package.  It's also *inherently* insecure and isn't just a
matter of being careful.  You can create functions in an untrusted
language carefully enough to allow it to be called by other users.  It
is simply prudent for the package provider to disable insecure functions
by default.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] dblink connection security

2007-07-08 Thread Stephen Frost
* Joe Conway ([EMAIL PROTECTED]) wrote:
 Consider a scenario like package x uses arbitrary function y in an 
 untrusted language z. Exact same concerns arise.

No, it doesn't...  Said arbitrary function in y, in untrusted language
z, could be perfectly safe for users to call.  Being written in an
untrusted language has got next to nothing to do with the security
implications of a particular function.  It depends entirely on what the
function is *doing*, not what language it's written in.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] dblink connection security

2007-07-08 Thread Stephen Frost
* Joe Conway ([EMAIL PROTECTED]) wrote:
 Stephen Frost wrote:
 No, it doesn't...  Said arbitrary function in y, in untrusted language
 z, could be perfectly safe for users to call.
  ^
 *Could* be. But we just said that the admin was not interested in reading 
 the documentation, and has no idea if it *is* safe. And, it very well might 
 not be safe. We have no way to know in advance because the language is 
 untrusted.

If it's not safe then it shouldn't be enabled by default.  That's pretty
much the point.  If something is known to be unsafe for users to have
access to then it should be disabled by default.

 Being written in an untrusted language has got next to nothing to do with 
 the security
 implications of a particular function.  It depends entirely on what the
 function is *doing*, not what language it's written in.

 Sure it matters. A function written in a trusted language is known to be 
 safe, a priori. A function written in an untrusted language has no such 
 guarantees, and therefore has to be assumed unsafe unless carefully proved 
 otherwise.

I see..  So all the functions in untrusted languages that come with PG
initially should be checked over by every sysadmin when installing PG
every time...  And the same for PostGIS, and all of the PL's that use
untrusted languages?

On my pretty modest install that's 2,206 functions.  For some reason I
see something of a difference between 'generate_series' and 'dblink' in
terms of security and which one I'm comfortable having enabled by
default and which one I'm not.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] dblink connection security

2007-07-08 Thread Stephen Frost
* Gregory Stark ([EMAIL PROTECTED]) wrote:
 Joe Conway [EMAIL PROTECTED] writes:
  Consider a scenario like package x uses arbitrary function y in an
  untrusted language z. Exact same concerns arise.
 
 Well arbitrary function may or may not actually do anything that needs to be
 restricted.
 
 If it does then yes the same concerns arise and the same conclusion reached.
 That users should be granted permission to execute it based on local policies.
 Certainly granting execute permission to public by default is a bad start in
 that regard.

Agreed, and regardless of the sysadmin doing x, y, or z, or what some
other package might be doing with untrusted languages, what matters here
is what we're doing and the functions we're providing.  Best practice is
to disable functions by default which aren't safe  secure for users to
have access to.

If you know of any others in anything we're distributing, please point
them out.  If there are some in related projects, point those out and
ask those projects to be careful with them and encourage them to disable
them by default.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] dblink connection security

2007-07-08 Thread Stephen Frost
* Joe Conway ([EMAIL PROTECTED]) wrote:
 Stephen Frost wrote:
 I see..  So all the functions in untrusted languages that come with PG
 initially should be checked over by every sysadmin when installing PG
 every time...  And the same for PostGIS, and all of the PL's that use
 untrusted languages?

 There are none installed by default -- that's the point.

Uhh...  None what?  Functions in untrusted languages?  That's certainly
not the case, there's a whole slew of them, from boolin to
generate_series and beyond.  They're available to regular users, even!

Or do you mean that there are no known-insecure functions which are
installed and enabled for users to use by default?  I'd have to agree
with you there in general, would kind of like to keep it that way too.

Perhaps you're referring to PLs, but then, I thought trusted PLs were
safe, but they're written using untrusted languages!  Are they safe, or
not?  Safe to use, but not safe to install?

 On my pretty modest install that's 2,206 functions.  For some reason I
 see something of a difference between 'generate_series' and 'dblink' in
 terms of security and which one I'm comfortable having enabled by
 default and which one I'm not.

 generate_series is a built in function. We aren't discussing those.

Uh, it's written in an untrusted language, isn't it?  Us poor sysadmins
are supposed to review all of them before letting users have access to
them, aren't we?  Now I'm just completely confused as to the distinction
you're making here.  Are functions in untrusted languages are problem,
or not?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] Preliminary GSSAPI Patches

2007-06-22 Thread Stephen Frost
* Henry B. Hotz ([EMAIL PROTECTED]) wrote:
 On Jun 22, 2007, at 9:56 AM, Magnus Hagander wrote:
 Most likely it's just checking the keytab to find a principal with the
 same name as the one presented from the client. Since one is  
 present, it
 loads it up automatically, and verifies against it.
 
 Bingo!
 
 The server uses the keytab to decrypt the token provided by the  
 client.  By using the GSS_C_NO_CREDENTIAL arg on the server anything  
 put in the keytab is OK.  (The server doesn't need to authenticate  
 itself to Kerberos, it just accepts authentication.  Mutual  
 authentication is done using the same keys.)  The documentation needs  
 to reflect that.

I agree there's some disconnect there between the documentation and the
apparent implementation but I'm not sure I'm in favor of changing the
documentation on this one.  Personally, I'd rather it return an error if
someone tries to use GSS_C_NO_CREDENTIAL when accepting a context than
to just be happy using anything in the keytab.

 The real question is whether we *need* to check anything.  If the  
 keytab is unique to PostgreSQL, as I think it should be, then I think  
 the admin should put only the right stuff in the keytab, and no  
 further checks are needed.

While I agree that in general it's best to have a keytab file for each
seperate service, there's no guarentee of that being done and using,
say, the 'host' service to allow authentication to PG when the PG
service is 'postgres' isn't right.

 Now it *might* make sense to check that the credential that's  
 accepted actually has some correspondence to what ./configure said.   

Or what's configured in the postgres.conf (as is done for Kerberos
now...).

 If we do do that, then we need to allow for the ways Microsoft mucks  
 with the case of the name.  (Kerberos is supposed to be case  
 sensitive, but Microsoft work that way.)  In particular I think we  
 may need both postgres/server and POSTGRES/server in the keytab  
 in order to support the to-be-written native Windows SSPI client at  
 the same time as the current Kerberos 5 and GSSAPI Unix clients.

Supporting multiple, specific, keys might be an interesting challenge,
but I'm not too keen on worrying about it right now regardless.  I'd
also much rather err on the side of overly paranoid than if it works,
just let it in.  If someone ends up having to support both windows SSPI
clients and unix Kerberos/GSSAPI clients it's entirely possible to
suggest they just make it POSTGRES and configure the clients
accordingly.

 Now what happens with non-Kerberos 5 mechansims like SPKM and  
 SPNEGO?  ;-)  I'll have to see, even if it doesn't matter for Java, yet.

Err, SPNEGO's not really a full mechanism itself, it's got sub-mechs of
NTLM and Kerberos and from what I've seen it does the same thing SSPI
does and assumes uppercase.  Again, it's ugly, but not unbearable.  Such
fun things are probably also why mod-auth-kerb for Apache lets you
configure the service name, like most other Kerberos servers...  I've
yet to run into one that will just use any key in the keytab it's given.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCHES] array_accum aggregate

2006-10-13 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 That's not really the flavor of solution I'd like to have.  Ideally,
 it'd actually *work* to write
 
   my_ffunc(my_sfunc(my_sfunc(null, 1), 2))
 
 and get the same result as aggregating over the values 1 and 2.  The
 trick is to make sure that my_sfunc and my_ffunc can only be used
 together.  Maybe we do need a type for each such aggregate ...

In general I like this idea but there are some complications, the main
one being where would the memory be allocated?  I guess we could just
always use the QueryContext.  The other issue is, in the above scenario
is it acceptable to modify the result of my_sfunc(null, 1) in the ,2
call?  Normally it's unacceptable to modify your input, aggregates get a
special exception when called as an aggregate, but in this case you'd
have to copy the entire custom structure underneath, I'd think.

As for a type for each such aggregate, that seems reasonable to me,
honestly.  I'd like for the type creation to be reasonably simple
though, if possible.  ie: could we just provide NULLs for the
input/output functions instead of having to implement functions that
just return an error?  Or perhaps have a polymorphic function which can
be reused that just returns an error.  Additionally, we'd have to be
able to mark the types as being polymorhpic along the same lines as
anyelement/anyarray.  Hopefully that's already easy, but if not it'd be
nice if it could be made easy...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCHES] array_accum aggregate

2006-10-13 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  The other issue is, in the above scenario
  is it acceptable to modify the result of my_sfunc(null, 1) in the ,2
  call?
 
 Yes, because the only place a nonnull value of the type could have come
 from is a my_sfunc call; since it's a pseudotype, we don't allow it on
 disk.  (We might also need a hack to prevent the type from being used as
 a record-type component ... not sure if that comes for free with being a
 pseudotype currently.)

Ah, ok.

  As for a type for each such aggregate, that seems reasonable to me,
  honestly.
 
 The ugly part is that we'd still need a way for the planner to recognize
 this class of types.

Hrm, true.  I would agree that they would need more memory than most
aggregates.  It seems likely to me that worst offenders are going to be
of the array_accum type- store all tuples coming in.  Therefore, my
suggestion would be that the memory usage be estimated along those lines
and allow for hashagg when there's enough memory to keep all the tuples
(or rather the portion of the tuple going into the aggregate) in memory
(plus some amount of overhead, maybe 10%).  That doesn't help with how
to get the planner to recognize this class of types though.  I don't
suppose we already have a class framework which the planner uses to
group types which have similar characteristics?

  Additionally, we'd have to be
  able to mark the types as being polymorhpic along the same lines as
  anyelement/anyarray.
 
 What for?

So that the finalfunc can be polymorphic along the lines of my suggested
aaccum_sfunc(anyarray,anyelement) returns anyarray and
aaccum_ffunc(anyarray) returns anyarray.  If the state type isn't
polymorphic then PG won't let me create a function from it which returns
a polymorphic, aiui.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] array_accum aggregate

2006-10-12 Thread Stephen Frost
* Neil Conway ([EMAIL PROTECTED]) wrote:
 On Wed, 2006-10-11 at 00:51 -0400, Stephen Frost wrote:
   Here, the actual state type for any aggregate call is the array type
  !having the actual input type as elements.  Note: array_accum() is now
  !a built-in aggregate which uses a much more efficient mechanism than
  !that which is provided by array_append, prior users of array_accum()
  !may be pleasantly suprised at the marked improvment for larger arrays.
  /para
 
 Not worth documenting, I think.

Ok.  Either way is fine with me, honestly.

  *** 15,20 
  --- 15,22 
#include utils/array.h
#include utils/builtins.h
#include utils/lsyscache.h
  + #include utils/memutils.h
  + #include nodes/execnodes.h
 
 Include directives should be sorted roughly alphabetically, with the
 exception of postgres.h and system headers.

Ah, yeah, you know I noticed that when I added memutils but wasn't
thinking when I went back and added execnodes (I had been trying to
minimize the number of includes by adding ones I needed one at a time
and seeing if any more were necessary).

  + /* Structure, used by aaccum_sfunc and aaccum_ffunc to
  +  * implement the array_accum() aggregate, for storing 
  +  * pointers to the ArrayBuildState for the array we are 
  +  * building and the MemoryContext in which it is being
  +  * built.  Note that this structure is 
  +  * considered an 'anyarray' externally, which is a
  +  * variable-length datatype, and therefore
  +  * must open with an int32 defining the length. */
 
 Gross.

Indeed. :/

  +   /* Make sure we are being called in an aggregate. */
  +   if (!fcinfo-context || !IsA(fcinfo-context, AggState))
  +   ereport(ERROR,
  +   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  +errmsg(Can not call aaccum_sfunc as a 
  non-aggregate),
  +errhint(Use the array_accum aggregate)));
 
 Per the error message style guidelines, errmsg() should begin with a
 lowercase letter and errhint() should be a complete sentence (so it
 needs punctuation, for example).

Ah, ok, guess I should go read those.

  + Datum
  + aaccum_ffunc(PG_FUNCTION_ARGS)
  + {
  +   aaccum_info *ainfo;
  + 
  +   /* Check if we are passed in a NULL */
  +   if (PG_ARGISNULL(0)) PG_RETURN_ARRAYTYPE_P(NULL);
 
 There is no guarantee why SQL NULL and PG_RETURN_XYZ(NULL) refer to the
 same thing -- use PG_RETURN_NULL() to return a SQL NULL value, or just
 make the function strict.

Huh, alright.  I'll probably just change it to PG_RETURN_NULL().

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] array_accum aggregate

2006-10-12 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 (However, now that we support nulls in arrays, meseems a more consistent
 definition would be that it allows null inputs and just includes them in
 the output.  So probably you do need it non-strict.)

This was my intention.

 I'm inclined to think that this example demonstrates a deficiency in the
 aggregate-function design: there should be a way to declare what we're
 really doing.  But I don't know exactly what that should look like.

I agree and would much rather have a clean solution which works with the
design than one which has to work outside it.  When I first was trying
to decide on the state-type I was looking through the PG catalogs for
essentially a complex C type which translated to a void*.  Perhaps
such a type could be added.  Unless that's considered along the lines of
an 'any' type it'd cause problems for the polymorphism aspect.  

Another alternative would be to provide a seperate area for each 
aggregate to put any other information it needs.  This would almost
certainly only be available to C functions but would essentially be a
void* which is provided through the AggState structure but tracked by
the aggregator routines and reset for each aggregate function being 
run.  If that's acceptable, I don't think it'd be all that difficult to
implement.  With that, aaccum_sfunc and aaccum_ffunc would ignore the 
state variable passed to them in favor of their custom structure 
available through fcinfo-AggState (I expect they'd just keep the 
state variable NULL and be marked non-strict, or set it to some constant
if necessary).  The pointer would have to be tracked somewhere and then
copied in/out on each call, but that doesn't seem too difficult to me.
After all, the state variable is already being tracked somewhere, this
would just sit next to it, in my head anyway.

I've got some time this weekend and would be happy to take a shot at
the second proposal if that's generally acceptable.

Thanks,

Stephen


signature.asc
Description: Digital signature


[PATCHES] array_accum aggregate

2006-10-10 Thread Stephen Frost
Greetings,

  Please find below a patch to add the array_accum aggregate as a
  built-in using two new C functions defined in array_userfuncs.c.
  These functions simply expose the pre-existing efficient array
  building routines used elsewhere in the backend (accumArrayResult
  and makeArrayResult, specifically).   An array_accum aggregate has
  existed in the documentation for quite some time using the
  inefficient (for larger arrays) array_append routine.  The
  documentation around the example has also been updated to reflect
  the addition of this built-in.

  Documentation and a regression test are also included.

Thanks,

Stephen

Index: doc/src/sgml/func.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.343
diff -c -r1.343 func.sgml
*** doc/src/sgml/func.sgml  1 Oct 2006 18:54:31 -   1.343
--- doc/src/sgml/func.sgml  11 Oct 2006 04:38:45 -
***
*** 7851,7856 
--- 7851,7872 
   row
entry
 indexterm
+ primaryarray_accum/primary
+/indexterm
+functionarray_accum(replaceable 
class=parameteranyelement/replaceable)/function
+   /entry
+   entry
+typeanyelement/type
+   /entry
+   entry
+ array of elements of same type as argument type
+   /entry
+   entryan array of all input elements (NULLs, non-nulls, and 
duplicates)/entry
+  /row
+ 
+  row
+   entry
+indexterm
  primaryaverage/primary
 /indexterm
 functionavg(replaceable 
class=parameterexpression/replaceable)/function
Index: doc/src/sgml/xaggr.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/xaggr.sgml,v
retrieving revision 1.33
diff -c -r1.33 xaggr.sgml
*** doc/src/sgml/xaggr.sgml 16 Sep 2006 00:30:16 -  1.33
--- doc/src/sgml/xaggr.sgml 11 Oct 2006 04:38:45 -
***
*** 132,138 
  /programlisting
  
 Here, the actual state type for any aggregate call is the array type
!having the actual input type as elements.
/para
  
para
--- 132,141 
  /programlisting
  
 Here, the actual state type for any aggregate call is the array type
!having the actual input type as elements.  Note: array_accum() is now
!a built-in aggregate which uses a much more efficient mechanism than
!that which is provided by array_append, prior users of array_accum()
!may be pleasantly suprised at the marked improvment for larger arrays.
/para
  
para
Index: src/backend/utils/adt/array_userfuncs.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/array_userfuncs.c,v
retrieving revision 1.20
diff -c -r1.20 array_userfuncs.c
*** src/backend/utils/adt/array_userfuncs.c 14 Jul 2006 14:52:23 -  
1.20
--- src/backend/utils/adt/array_userfuncs.c 11 Oct 2006 04:38:46 -
***
*** 15,20 
--- 15,22 
  #include utils/array.h
  #include utils/builtins.h
  #include utils/lsyscache.h
+ #include utils/memutils.h
+ #include nodes/execnodes.h
  
  
  
/*-
***
*** 399,404 
--- 401,516 
PG_RETURN_ARRAYTYPE_P(result);
  }
  
+ /* Structure, used by aaccum_sfunc and aaccum_ffunc to
+  * implement the array_accum() aggregate, for storing 
+  * pointers to the ArrayBuildState for the array we are 
+  * building and the MemoryContext in which it is being
+  * built.  Note that this structure is 
+  * considered an 'anyarray' externally, which is a
+  * variable-length datatype, and therefore
+  * must open with an int32 defining the length. */
+ typedef struct {
+   int32vl_len;
+   ArrayBuildState *astate;
+   MemoryContextarrctx;
+ } aaccum_info;
+ 
+ 
/*-
+  * aaccum_sfunc :
+  *  State transistion function for the array_accum() aggregate,
+  *  efficiently builds an in-memory array by working in blocks and
+  *  minimizing realloc()'s and copying of the data in general.
+  *  Creates a seperate memory context attached to the AggContext into
+  *  which the array is built.  That context is free'd when the final
+  *  function is called (aaccum_ffunc).  accumArrayResult() does all
+  *  the heavy lifting here, this is really just a glue function.
+  *
+  */
+ Datum
+ aaccum_sfunc(PG_FUNCTION_ARGS)
+ {
+   aaccum_info *ainfo;
+   AggState*aggstate;
+ 
+   /* Make sure we are being called in an aggregate. */
+   if (!fcinfo-context || !IsA(fcinfo-context, AggState))
+  

[PATCHES] BUG #2246: Only call pg_fe_getauthname if none given

2006-02-15 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Right offhand I like the idea of pushing it into connectOptions2 --- can
 you experiment with that?  Seems like there is no reason to call
 Kerberos if the user supplies the name to connect as.

Patch attached.  After looking through the code around this I discovered
that conninfo_parse is also called by PQconndefaults.  This patch
changes the 'default' returned for user to NULL (instead of whatever
pg_fe_getauthname returns).  This is probably better than not calling
Kerberos in pg_fe_getauthname and potentially returning the wrong thing
(if the username doesn't match the princ, a common situation), but it
might break existing applications.  Are those applications wrong to be
asking libpq to provide what it thinks the username is when asking for
the defaults?  I'd say probably yes, but then we don't provide another
way for them to get it either.

Patch tested w/ 'trust', 'ident', 'md5' and 'krb5' methods from psql.

Enjoy,

Stephen
Index: src/interfaces/libpq/fe-connect.c
===
RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/fe-connect.c,v
retrieving revision 1.326
diff -c -r1.326 fe-connect.c
*** src/interfaces/libpq/fe-connect.c   13 Feb 2006 22:33:57 -  1.326
--- src/interfaces/libpq/fe-connect.c   15 Feb 2006 18:21:06 -
***
*** 96,105 
   * fallback is available. If after all no value can be determined
   * for an option, an error is returned.
   *
-  * The value for the username is treated specially in conninfo_parse.
-  * If the Compiled-in resource is specified as a NULL value, the
-  * user is determined by pg_fe_getauthname().
-  *
   * The Label and Disp-Char entries are provided for applications that
   * want to use PQconndefaults() to create a generic database connection
   * dialog. Disp-Char is defined as follows:
--- 96,101 
***
*** 423,434 
--- 419,448 
   *
   * Compute derived connection options after absorbing all user-supplied info.
   *
+  * The value for the username is treated specially in connectOptions2.
+  * If the Compiled-in resource is specified as a NULL value and the user
+  * doesn't provide a username to use then the user is determined by 
+  * calling pg_fe_getauthname().
+  *
   * Returns true if OK, false if trouble (in which case errorMessage is set
   * and so is conn-status).
   */
  static bool
  connectOptions2(PGconn *conn)
  {
+   charerrortmp[PQERRORMSG_LENGTH];
+ 
+   /*
+* Find username to use if none given.  This may call
+* additional helpers (ie: krb5) if those auth methods
+* are compiled in.
+*/
+   if (conn-pguser == NULL || conn-pguser[0] == '\0')
+   {
+   conn-pguser = pg_fe_getauthname(errortmp);
+   /* note any error message is thrown away */
+   }
+ 
/*
 * If database name was not given, default it to equal user name
 */
***
*** 2505,2511 
char   *cp2;
PQconninfoOption *options;
PQconninfoOption *option;
-   charerrortmp[PQERRORMSG_LENGTH];
  
/* Make a working copy of PQconninfoOptions */
options = malloc(sizeof(PQconninfoOptions));
--- 2519,2524 
***
*** 2722,2737 
}
continue;
}
- 
-   /*
-* Special handling for user
-*/
-   if (strcmp(option-keyword, user) == 0)
-   {
-   option-val = pg_fe_getauthname(errortmp);
-   /* note any error message is thrown away */
-   continue;
-   }
}
  
return options;
--- 2735,2740 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] BUG #2246: Only call pg_fe_getauthname if none given

2006-02-15 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 This is probably not a good idea --- changing the API behavior in
 pursuit of saving a few cycles is just going to get people mad at us.

Fair enough.

 I think we'd have to refactor the code so that PQsetdbLogin gets a
 PQconninfoOption array, overrides values *in that array*, then calls the
 fallback-substitution code etc.  Not sure if it's worth the trouble.
 The extra complexity of searching the array for values to override could
 eat up the cycles we're hoping to save, too :-(

Perhaps I'm missing something obvious (and if so, I'm sorry) but
couldn't we just build up the character array in PQsetdbLogin to be
passed in to connectOptions1?  If we do that we could probably also
merge the two connectOptions...  That would simplify things a great deal
I think and would also avoid the extra processing to pick up the
'defaults'...

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] [HACKERS] Krb5 multiple DB connections

2006-02-09 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  I have little idea of how expensive the operations called by
  pg_krb5_init really are.  If they are expensive then it'd probably
  make sense to keep the current static variables but treat 'em as a
  one-element cache, ie, recompute if a new user name is being demanded.
  If not, we ought to be able to simplify some things.
 
  We'd have to recompute based on the KRB5CCNAME environment variable
  changing, which is certainly an option.  It's not necessairly the case
  that the username is changing, possibly just the cache.
 
 Hm, apparently I completely misunderstand the problem here.  What I
 thought the bug was was that the cache wasn't recomputed given an
 attempt to connect as a different Postgres username than the first
 time.  If that's not the issue, then what is?

The specific problem which I and the original reporter ran into is this:

KRB5CCACHE=/tmp/krb5cc_apache_aev0kF
pg_connect() -- works fine
pg_close() -- works fine
rm /tmp/krb5cc_apache_aev0kF
KRB5CCACHE=/tmp/krb5cc_apache_cVMRtA
pg_connect() -- Doesn't work, Kerberos error is no credentials cache

What's happening here is that for every connection to apache by the
client a new credentials cache is created, and then destroyed when the
connection closes.  When using PHP (ie: phppgadmin) and mod_php (as is
common) the Apache process is the one actually making the connection to
the database and a given Apache process usually serves multiple requests
in its lifetime, sometimes to the same user, sometimes to different
users.

The static variables being used are for: krb5_init_context,
krb5_cc_default, krb5_cc_get_principal, and krb5_unparse_name.
Technically, between one connection and the next, krb5_cc_default,
krb5_cc_get_principal and krb5_unparse_name could reasonably return
different values.  krb5_init_context is pretty unlikely to change as
that would mean /etc/krb5.conf changed.  Looking through the krb5 source
code it appears that the only one which checks for something existing
is krb5_cc_default_name (called by krb5_cc_default), which will just 
return the current ccache name if one has been set.
(src/lib/krb5/os/ccdefname.c:278, krb5-1.4.3)

We initially brought up this issue with the Kerberos folks actually:
http://pch.mit.edu/pipermail/kerberos/2006-February/009225.html

They pretty clearly felt that the application was responsible for
handling the cache in the event it changes.  Unfortunately, it's not the
application which is talking to Kerberos but another library in this
case which doesn't expose the Kerberos API to the application in such a
way to allow the application to notify Kerberos of the cache change.

Looking through the Kerberos API again it looks like it might be
possible to use krb5_cc_set_default_name(context, NULL) to force
krb5_cc_default_name() (from krb5_cc_default()) to re-find the cache.
Finding the cache again is reasonably inexpensive.  I've tried a couple
of things along these lines now but I havn't found a workable solution
which doesn't reinitialize the main Kerberos context, unfortunately.
I'll keep working on it as time allows though honestly I don't believe
it's generally terribly expensive to reinitialize the context...

Sorry it took so long to reply, running down the paths through the 
various libraries takes a bit of time and I was really hoping to be able
to suggest an alternative solution using krb5_cc_set_default_name.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] [HACKERS] Krb5 multiple DB connections

2006-02-08 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
The attached patch fixes a bug which was originally brought up in May
of 2002 in this thread:
 
 Now that I've looked at it, I find this patch seems fairly wrongheaded.
 AFAICS the entire point of the original coding is to allow the setup
 work needed to create the krb5_context etc to be amortized across
 multiple connections.  The patch destroys that advantage, but yet keeps
 as much as it can of the notational cruft induced by the original
 design -- for instance, there seems little point in the
 pg_krb5_initialised flag if we aren't ever going to have any
 pre-initialized state.

I'm honestly not entirely sure I agree about that being the point of the
original coding but regardless the crux of the problem here is that
there's no way to get libpq to use a cache other than the one it's
initialized with for a given session.  That part of the Kerberos API
which supports that isn't exposed in any way beyond the KRB5CCNAME
environment variable and the call to ask for the 'default' ccache is
called with the static variable the second time and it ignores the
request when there's an already valid (it thinks) ccache.

 I have little idea of how expensive the operations called by
 pg_krb5_init really are.  If they are expensive then it'd probably
 make sense to keep the current static variables but treat 'em as a
 one-element cache, ie, recompute if a new user name is being demanded.
 If not, we ought to be able to simplify some things.

We'd have to recompute based on the KRB5CCNAME environment variable
changing, which is certainly an option.  It's not necessairly the case
that the username is changing, possibly just the cache.  Additionally,
the calls themselves are not very expensive when being called on an
existing cache, the most expensive thing is reaching out to the KDC to
get a new service ticket which will either need to be done, or won't,
depending on if a valid service ticket already exists in the cache or
not.

 Another point here is how all this interacts with thread safety.
 If we get rid of the static variables, do we still need the
 pglock_thread() operations?

Good question, I'm afraid probably not.  I'd have to look through it
again but last I checked MIT Kerberos prior to 1.4 (and I'm not 100%
sure it's resolved in 1.4) wasn't threadsafe itself.

I'd certainly be happy to rework the patch based on these comments, of
course.  Honestly, I'm pretty sure the original patch was intended to be
minimal (and is for the most part).  These changes would introduce more
logic but if that's alright I'd be happy to do it.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] [HACKERS] Krb5 multiple DB connections

2006-02-06 Thread Stephen Frost
Greetings,

  The attached patch fixes a bug which was originally brought up in May
  of 2002 in this thread:

  http://archives.postgresql.org/pgsql-interfaces/2002-05/msg00083.php

  The original bug reporter also supplied a patch to fix the problem:

  http://archives.postgresql.org/pgsql-interfaces/2002-05/msg00090.php

  I ran into exactly the same issue using 8.1.2.  I've updated the
  original patch to work for HEAD and 8.1.2.  I've tested the patch on
  both HEAD and 8.1.2, both at home and at work, and it works quite
  nicely.  In fact, I hope to have a patch to phppgadmin which will make
  it properly handle Kerberized logins.

Thanks,

Stephen
Index: src/interfaces/libpq/fe-auth.c
===
RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/fe-auth.c,v
retrieving revision 1.110
diff -c -r1.110 fe-auth.c
*** src/interfaces/libpq/fe-auth.c  26 Dec 2005 14:58:05 -  1.110
--- src/interfaces/libpq/fe-auth.c  5 Feb 2006 20:03:21 -
***
*** 101,122 
   * Various krb5 state which is not connection specific, and a flag to
   * indicate whether we have initialised it yet.
   */
  static intpg_krb5_initialised;
  static krb5_context pg_krb5_context;
  static krb5_ccache pg_krb5_ccache;
  static krb5_principal pg_krb5_client;
  static char *pg_krb5_name;
  
  
  static int
! pg_krb5_init(char *PQerrormsg)
  {
krb5_error_code retval;
  
!   if (pg_krb5_initialised)
return STATUS_OK;
  
!   retval = krb5_init_context(pg_krb5_context);
if (retval)
{
snprintf(PQerrormsg, PQERRORMSG_LENGTH,
--- 101,133 
   * Various krb5 state which is not connection specific, and a flag to
   * indicate whether we have initialised it yet.
   */
+ /* 
  static intpg_krb5_initialised;
  static krb5_context pg_krb5_context;
  static krb5_ccache pg_krb5_ccache;
  static krb5_principal pg_krb5_client;
  static char *pg_krb5_name;
+ */
+ 
+ struct krb5_info
+ {
+   int pg_krb5_initialised;
+   krb5_contextpg_krb5_context;
+   krb5_ccache pg_krb5_ccache;
+   krb5_principal  pg_krb5_client;
+   char*pg_krb5_name;
+ };
  
  
  static int
! pg_krb5_init(char *PQerrormsg, struct krb5_info *info)
  {
krb5_error_code retval;
  
!   if (info-pg_krb5_initialised)
return STATUS_OK;
  
!   retval = krb5_init_context((info-pg_krb5_context));
if (retval)
{
snprintf(PQerrormsg, PQERRORMSG_LENGTH,
***
*** 125,170 
return STATUS_ERROR;
}
  
!   retval = krb5_cc_default(pg_krb5_context, pg_krb5_ccache);
if (retval)
{
snprintf(PQerrormsg, PQERRORMSG_LENGTH,
 pg_krb5_init: krb5_cc_default: %s\n,
 error_message(retval));
!   krb5_free_context(pg_krb5_context);
return STATUS_ERROR;
}
  
!   retval = krb5_cc_get_principal(pg_krb5_context, pg_krb5_ccache,
!  
pg_krb5_client);
if (retval)
{
snprintf(PQerrormsg, PQERRORMSG_LENGTH,
 pg_krb5_init: krb5_cc_get_principal: %s\n,
 error_message(retval));
!   krb5_cc_close(pg_krb5_context, pg_krb5_ccache);
!   krb5_free_context(pg_krb5_context);
return STATUS_ERROR;
}
  
!   retval = krb5_unparse_name(pg_krb5_context, pg_krb5_client, 
pg_krb5_name);
if (retval)
{
snprintf(PQerrormsg, PQERRORMSG_LENGTH,
 pg_krb5_init: krb5_unparse_name: %s\n,
 error_message(retval));
!   krb5_free_principal(pg_krb5_context, pg_krb5_client);
!   krb5_cc_close(pg_krb5_context, pg_krb5_ccache);
!   krb5_free_context(pg_krb5_context);
return STATUS_ERROR;
}
  
!   pg_krb5_name = pg_an_to_ln(pg_krb5_name);
  
!   pg_krb5_initialised = 1;
return STATUS_OK;
  }
  
  
  /*
   * pg_krb5_authname -- returns a pointer to static space containing whatever
--- 136,191 
return STATUS_ERROR;
}
  
!   retval = krb5_cc_default(info-pg_krb5_context, 
(info-pg_krb5_ccache));
if (retval)
{
snprintf(PQerrormsg, PQERRORMSG_LENGTH,
 pg_krb5_init: krb5_cc_default: %s\n,
 error_message(retval));
!   krb5_free_context(info-pg_krb5_context);
return STATUS_ERROR;
}
  
!   retval = krb5_cc_get_principal(info-pg_krb5_context, 
info-pg_krb5_ccache,
!  
(info-pg_krb5_client));

Re: [PATCHES] pg_restore COPY error handling

2006-02-05 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  I believe the attached patch does this now.  Under my test case it
  correctly handled things.  I'm certainly happier with it this way and
  apologize for not realizing this better approach sooner.  Please
  comment.
 
 Applied (with trivial stylistic changes) as far back as 8.0, which
 was the first release that would try to continue after an error.

Great, thanks!  Now if I can convince you to look at the Kerberos patch
I posted on -hackers... ;)

Thanks again,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_restore COPY error handling

2006-02-02 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 I agree.  I wonder if it wouldn't be cleaner to pass the information in
 the other direction, ie, send a boolean down to PrintTocData saying you
 are sending SQL commands or you are sending COPY data.  Then, instead
 of depending only on the libpq state to decide what to do in
 ExecuteSqlCommandBuf, we could cross-check: if we're sending SQL data
 and the libpq state is wrong, just discard the line.

I believe the attached patch does this now.  Under my test case it
correctly handled things.  I'm certainly happier with it this way and
apologize for not realizing this better approach sooner.  Please
comment.

Thanks!

Stephen
Index: src/bin/pg_dump/pg_backup_archiver.c
===
RCS file: /projects/cvsroot/pgsql/src/bin/pg_dump/pg_backup_archiver.c,v
retrieving revision 1.119
diff -c -r1.119 pg_backup_archiver.c
*** src/bin/pg_dump/pg_backup_archiver.c21 Jan 2006 02:16:20 -  
1.119
--- src/bin/pg_dump/pg_backup_archiver.c3 Feb 2006 02:47:33 -
***
*** 330,339 
--- 330,345 
 * with libpq.
 */
if (te-copyStmt  
strlen(te-copyStmt)  0)
+   {
ahprintf(AH, %s, 
te-copyStmt);
+   AH-writingCopy = 1;
+   }
  
(*AH-PrintTocDataPtr) (AH, te, 
ropt);
  
+   if (te-copyStmt  
strlen(te-copyStmt)  0)
+   AH-writingCopy = 0;
+ 
_enableTriggersIfNecessary(AH, 
te, ropt);
}
}
***
*** 1590,1595 
--- 1596,1602 
AH-compression = compression;
  
AH-pgCopyBuf = createPQExpBuffer();
+   AH-writingCopy = 0;
AH-sqlBuf = createPQExpBuffer();
  
/* Open stdout with no compression for AH output handle */
Index: src/bin/pg_dump/pg_backup_archiver.h
===
RCS file: /projects/cvsroot/pgsql/src/bin/pg_dump/pg_backup_archiver.h,v
retrieving revision 1.68
diff -c -r1.68 pg_backup_archiver.h
*** src/bin/pg_dump/pg_backup_archiver.h15 Oct 2005 02:49:38 -  
1.68
--- src/bin/pg_dump/pg_backup_archiver.h3 Feb 2006 02:47:33 -
***
*** 245,250 
--- 245,251 
  
int loFd;   /* BLOB fd */
int writingBlob;/* Flag */
+   int writingCopy;/* Flag to indicate if we are 
in COPY mode */
int blobCount;  /* # of blobs restored 
*/
  
char   *fSpec;  /* Archive File Spec */
Index: src/bin/pg_dump/pg_backup_db.c
===
RCS file: /projects/cvsroot/pgsql/src/bin/pg_dump/pg_backup_db.c,v
retrieving revision 1.66
diff -c -r1.66 pg_backup_db.c
*** src/bin/pg_dump/pg_backup_db.c  15 Oct 2005 02:49:38 -  1.66
--- src/bin/pg_dump/pg_backup_db.c  3 Feb 2006 02:47:33 -
***
*** 389,395 
 *-
 */
  
!   if (PQputline(AH-connection, AH-pgCopyBuf-data) != 0)
die_horribly(AH, modulename, error returned by PQputline\n);
  
resetPQExpBuffer(AH-pgCopyBuf);
--- 389,395 
 *-
 */
  
!   if (AH-pgCopyIn  PQputline(AH-connection, AH-pgCopyBuf-data) != 0)
die_horribly(AH, modulename, error returned by PQputline\n);
  
resetPQExpBuffer(AH-pgCopyBuf);
***
*** 400,406 
  
if (isEnd)
{
!   if (PQendcopy(AH-connection) != 0)
die_horribly(AH, modulename, error returned by 
PQendcopy\n);
  
AH-pgCopyIn = 0;
--- 400,406 
  
if (isEnd)
{
!   if (AH-pgCopyIn  PQendcopy(AH-connection) != 0)
die_horribly(AH, modulename, error returned by 
PQendcopy\n);
  
AH-pgCopyIn = 0;
***
*** 615,621 
/* Could switch between command and COPY IN mode at each line */
while (qry  eos)
{
!   if (AH-pgCopyIn)
qry = _sendCopyLine(AH, qry, eos);
else
qry = _sendSQLLine(AH, qry, eos);
--- 615,624 
/* Could switch between command and COPY IN mode at each line */
while (qry  eos)
{
!   /* If we are in CopyIn mode *or* 

Re: [PATCHES] pg_restore COPY error handling

2006-02-01 Thread Stephen Frost
* Bruce Momjian (pgman@candle.pha.pa.us) wrote:
 Stephen Frost wrote:
 -- Start of PGP signed section.
  * Bruce Momjian (pgman@candle.pha.pa.us) wrote:
   Stephen Frost wrote:
Great!  It'd be really nice to have this fix in 8.1.3... :)
   
   No, it will not be in 8.1.3.  It is a new feature.
  
  I'd really appriciate it if you could review my post to pgsql-bugs,
  Message-ID: [EMAIL PROTECTED].  If this patch is
  considered a new feature then I'd like to either revisit that decision
  or be given some direction on what a bug-fix patch for the grows-to-3G
  bug would look like.
 
 Oh, so when it skips the \., it reads the rest of the file and bombs
 out?  I thought it just continued fine.  How is that failure happening?

It doesn't actually bomb out either.  On the system I was working with
what happened was that it hung after reading in the COPY data.  It
looked like it got stuck somehow while trying to parse the data as an
SQL command.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_restore COPY error handling

2006-02-01 Thread Stephen Frost
* Andrew Dunstan ([EMAIL PROTECTED]) wrote:
 this is a hopeless way of giving a reference. Many users don't keep list
 emails. If you want to refer to a previous post you should give a
 reference to the web archives.

Sorry, I'm actually pretty used to using Message IDs for references (we
do it alot on the Debian lists).  I'll try to be better in the future
though, honestly, I'm not really a big fan of the Postgres mailing list
archive setup. :/

 I assume you are referring to this post:
 http://archives.postgresql.org/pgsql-bugs/2006-01/msg00188.php

Yup, that's the one, thanks.

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_restore COPY error handling

2006-02-01 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  Andrew Dunstan wrote:
  I assume you are referring to this post:
  http://archives.postgresql.org/pgsql-bugs/2006-01/msg00188.php
 
  OK, that helps.  The solution is to not do that, meaning install
  postgis before the restore or something.  Anyway, adding the patch seems
  like too large a risk for a minor release, especially since you are the
  first person to complain about it that I remember.
 
 I don't think you should see this as something specific to PostGIS.
 If I interpret Stephen's report properly, any sort of failure during
 COPY IN would lead to undesirable behavior in pg_restore.

I'm reasonably confident this is exactly the case.

 This is not super surprising because the original design approach for
 pg_restore was bomb out on any sort of difficulty whatsoever.  That
 was justly complained about, and now it will try to keep going after
 SQL-command errors ... but it sounds like the COPY-data processing
 part didn't get fixed properly.

Exactly so.  Possibly because it appears to be non-trivial to detect
when it was a *COPY* command which failed (to differentiate it from some
other command).  What I did was check for 'whitespaceCOPY'.  I'd be
happy to change that if anyone has a better suggestion though.  There
might be a way to pass that information down into the pg_archive_db but
I don't think it's available at that level currently and I didn't see
any way to get the answer from libpq about what *kind* of command
failed.

 I take no position on whether Stephen's patch is any good --- I haven't
 looked at it --- but if he's correct about the problem then I agree it's
 a bug fix.  Before deciding whether it deserves to be back-patched, we
 at least ought to look closely enough to characterize the situations in
 which pg_restore will fail.

I have some level of confidence that this is the only case along these
lines because it's the only case where pg_restore isn't dealing with SQL
commands but with a dataset which has to be handled in a special way.
pg_restore checks if the command sent was a successful COPY command and
then treats everything after it in a special way; it doesn't do that for
any other commands...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_restore COPY error handling

2006-02-01 Thread Stephen Frost
* Andrew Dunstan ([EMAIL PROTECTED]) wrote:
 Tom Lane said:
  Also, it might be possible to depend on whether libpq has entered the
  copy in state.  I'm not sure this works very well, because if there's
  an error in the COPY command itself (not in the following data) then we
  probably don't ever get into copy in state.
 
 Could we not refrain from sending data if we detect that we are not in copy
 in state?

You have to know at that level if it's data you're getting or an SQL
statement though.  SQL statements can fail and things move along
happily.  Essentially what my patch does is detect when a COPY command
fails and then circular-file the data that comes in after it from the
upper-level.  When I started to look at the way pg_restore worked I had
initially thought I could make the higher-level code realize that the
COPY command failed through a return-code and have it not pass the data
down but the return codes didn't appear to be very well defined to
handle that kind of communication...  (As in, it looked like trying to
make that happen would break other things), I can look at it again
though.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_restore COPY error handling

2006-02-01 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 ISTM you should be depending on the archive structure: at some level,
 at least, pg_restore knows darn well whether it is dealing with table
 data or SQL commands.

Alright, to do this, ExecuteSqlCommandBuf would need to be modified to
return an error-code other than 1 for when a command fails.  Thankfully,
only pg_backup_archiver.c uses ExecuteSqlCommandBuf, in ahwrite.
ahwrite would then need to return a value when there's an error (at the
moment it's defined to return int but everything appears to ignore its
return value).  We'd then need ahprintf to return a negative value when
it fails (at the moment it returns 'cnt' which appears to be the size of
what was written, except that nothing looks at the return value of
ahprintf either).

Once we have ahprintf returning a negative value when it fails, we can
modify pg_backup_archiver.c around line 332 to check if the ahprintf
failed for a COPY statement and if so to skip the:
(*AH-PrintTocDataPtr) (AH, te, ropt); // on line 335.

I'd be happy to work this up, and I think it would work, but it seems
kind of ugly since then we'd have ahwrite and ahprintf returning error
codes which in 99% of the cases wouldn't be checked which doesn't seem
terribly clean. :/

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_restore COPY error handling

2006-02-01 Thread Stephen Frost
* Stephen Frost ([EMAIL PROTECTED]) wrote:
 * Tom Lane ([EMAIL PROTECTED]) wrote:
  ISTM you should be depending on the archive structure: at some level,
  at least, pg_restore knows darn well whether it is dealing with table
  data or SQL commands.
 
[...]
 I'd be happy to work this up, and I think it would work, but it seems
 kind of ugly since then we'd have ahwrite and ahprintf returning error
 codes which in 99% of the cases wouldn't be checked which doesn't seem
 terribly clean. :/

Looking at this again, I'm not 100% sure it'd work quite that cleanly.
I'm not sure you can just skip that PrintTocDataPtr call, depending on
the how the input is coming in...  There might be a way to modify
_PrintData (in pg_backup_custom.c, and the others, which is what is
the function behind PrintTocDataPtr) to somehow check an AH variable
which essentially says the data command failed, just ignore the input,
and we'd need to set the AH variable somewhere else, perhaps using the
return value setup I described before...

All of this is sounding rather invasive though.  I have to admit that
I'm not exactly a big fan of the pg_dump/pg_restore modular system. :/

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_restore COPY error handling

2006-02-01 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  * Tom Lane ([EMAIL PROTECTED]) wrote:
  This is not super surprising because the original design approach for
  pg_restore was bomb out on any sort of difficulty whatsoever.  That
  was justly complained about, and now it will try to keep going after
  SQL-command errors ... but it sounds like the COPY-data processing
  part didn't get fixed properly.
 
  Exactly so.  Possibly because it appears to be non-trivial to detect
  when it was a *COPY* command which failed (to differentiate it from some
  other command).  What I did was check for 'whitespaceCOPY'.  I'd be
  happy to change that if anyone has a better suggestion though.
 
 ISTM you should be depending on the archive structure: at some level,
 at least, pg_restore knows darn well whether it is dealing with table
 data or SQL commands.

Right, it does higher up in the code but I don't believe that knowledge
is passed down to the level it needs to be because it wasn't necessary
before and the module API didn't include a way to pass that information
into a given module.  I expect this is why the code currently depends on 
libpq to tell it when it has entered a 'copy in' state.  I'll look into 
this though and see just how invasive it would be to get the information 
to that level.

 Also, it might be possible to depend on whether libpq has entered the
 copy in state.  I'm not sure this works very well, because if there's
 an error in the COPY command itself (not in the following data) then we
 probably don't ever get into copy in state.

This is what the code currently depends on, and libpq (properly, imv)
doesn't enter the 'copy in' state when the COPY command itself fails.
That's exactly how we end up in this situation where pg_restore thinks
it's looking at a SQL command (because libpq said it wasn't in a COPY
state) when it's really getting COPY data from the higher level in the
code.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] pg_restore COPY error handling

2006-01-28 Thread Stephen Frost
Bruce, et al,

* Bruce Momjian (pgman@candle.pha.pa.us) wrote:
 This needs a little style cleanup but I can do that.

Thanks!

 Your patch has been added to the PostgreSQL unapplied patches list at:
 
   http://momjian.postgresql.org/cgi-bin/pgpatches
 
 It will be applied as soon as one of the PostgreSQL committers reviews
 and approves it.

Great!  It'd be really nice to have this fix in 8.1.3... :)

Thanks again,

Stephen

 ---
 Stephen Frost wrote:
  * Stephen Frost ([EMAIL PROTECTED]) wrote:
   Needs to be changed to handle whitespace in front of the actual 'COPY',
   unless someone else has a better idea.  This should be reasonably
   trivial to do though...  If you'd like me to make that change and send
   in a new patch, just let me know.
  
  Fixed, using isspace().  Also added an output message to make it a bit
  clearer when a failed COPY has been detected, etc.
  
  Updated patch attached.
  
  Thanks,
  
  Stephen
 
 [ Attachment, skipping... ]
 
  
  ---(end of broadcast)---
  TIP 4: Have you searched our list archives?
  
 http://archives.postgresql.org
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   pgman@candle.pha.pa.us   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073


signature.asc
Description: Digital signature


[PATCHES] pg_restore COPY error handling

2006-01-21 Thread Stephen Frost
* Stephen Frost ([EMAIL PROTECTED]) wrote:
 Needs to be changed to handle whitespace in front of the actual 'COPY',
 unless someone else has a better idea.  This should be reasonably
 trivial to do though...  If you'd like me to make that change and send
 in a new patch, just let me know.

Fixed, using isspace().  Also added an output message to make it a bit
clearer when a failed COPY has been detected, etc.

Updated patch attached.

Thanks,

Stephen
Index: src/bin/pg_dump/pg_backup_db.c
===
RCS file: /projects/cvsroot/pgsql/src/bin/pg_dump/pg_backup_db.c,v
retrieving revision 1.66
diff -c -r1.66 pg_backup_db.c
*** src/bin/pg_dump/pg_backup_db.c  15 Oct 2005 02:49:38 -  1.66
--- src/bin/pg_dump/pg_backup_db.c  21 Jan 2006 19:54:56 -
***
*** 292,297 
--- 292,298 
PGconn *conn = AH-connection;
PGresult   *res;
charerrStmt[DB_MAX_ERR_STMT];
+   int wsoffset = 0;
  
/* fprintf(stderr, Executing: '%s'\n\n, qry-data); */
res = PQexec(conn, qry-data);
***
*** 306,311 
--- 307,317 
}
else
{
+   /* Catch that this is a failed copy command, and
+* set pgCopyIn accordingly */
+   while (isspace(qry-data[wsoffset])) wsoffset++;
+   if (strncasecmp(qry-data+wsoffset,COPY ,5) == 0) 
AH-pgCopyIn = -1;
+ 
strncpy(errStmt, qry-data, DB_MAX_ERR_STMT);
if (errStmt[DB_MAX_ERR_STMT - 1] != '\0')
{
***
*** 317,322 
--- 323,330 
warn_or_die_horribly(AH, modulename, %s: %sCommand 
was: %s\n,
 desc, 
PQerrorMessage(AH-connection),
 errStmt);
+ 
+   if (AH-pgCopyIn == -1) write_msg(NULL, COPY failed, 
skipping COPY data.\n);
}
}
  
***
*** 389,395 
 *-
 */
  
!   if (PQputline(AH-connection, AH-pgCopyBuf-data) != 0)
die_horribly(AH, modulename, error returned by PQputline\n);
  
resetPQExpBuffer(AH-pgCopyBuf);
--- 397,405 
 *-
 */
  
!   /* If this is a failed copy command (pgCopyIn == -1) then just
!* fall through */
!   if (AH-pgCopyIn == 1  PQputline(AH-connection, AH-pgCopyBuf-data) 
!= 0)
die_horribly(AH, modulename, error returned by PQputline\n);
  
resetPQExpBuffer(AH-pgCopyBuf);
***
*** 400,406 
  
if (isEnd)
{
!   if (PQendcopy(AH-connection) != 0)
die_horribly(AH, modulename, error returned by 
PQendcopy\n);
  
AH-pgCopyIn = 0;
--- 410,418 
  
if (isEnd)
{
!   /* If this is a failed copy command (pgCopyIn == -1) then just
!* fall through */
!   if (AH-pgCopyIn == 1  PQendcopy(AH-connection) != 0)
die_horribly(AH, modulename, error returned by 
PQendcopy\n);
  
AH-pgCopyIn = 0;
***
*** 615,621 
/* Could switch between command and COPY IN mode at each line */
while (qry  eos)
{
!   if (AH-pgCopyIn)
qry = _sendCopyLine(AH, qry, eos);
else
qry = _sendSQLLine(AH, qry, eos);
--- 627,637 
/* Could switch between command and COPY IN mode at each line */
while (qry  eos)
{
!   /* If this is a working COPY *or* a failed COPY, call
!* _sendCopyLine to handle the incoming data from the COPY
!* command, it will just circular-file the data if we're
!* running a failed COPY. */
!   if (AH-pgCopyIn == 1 || AH-pgCopyIn == -1)
qry = _sendCopyLine(AH, qry, eos);
else
qry = _sendSQLLine(AH, qry, eos);

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PATCHES] [PATCH] pg_restore COPY error handling

2006-01-20 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
It seems like pg_restore really should be able to handle COPY errors
correctly by skipping to the end of the COPY data segment when the
initial COPY command comes back as an error.
 
 Send a patch ;-)

This is what I get for knowing how to copy  paste C code, eh? ;-)

Attached is a patch to pg_restore, against HEAD but I think it'd work
against 8.1 just fine, to better handle it when a COPY command fails
(for whatever reason) during a DB restore.

Attempting to restore from a dump with 2 table objects under 8.1:


[EMAIL PROTECTED]:/data/sfrost/postgres pg_restore -d tsf -Fc tiger_test.dump
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 412; 16672 1135494071 SCHEMA 
tiger_test postgres
pg_restore: [archiver (db)] could not execute query: ERROR:  permission denied 
for database tsf
Command was:
CREATE SCHEMA tiger_test;
pg_restore: [archiver (db)] could not execute query: ERROR:  schema 
tiger_test does not exist
Command was: ALTER SCHEMA tiger_test OWNER TO postgres;
pg_restore: [archiver] Error from TOC entry 21177; 1259 1135494072 TABLE 
bg01_d00 postgres
pg_restore: [archiver] could not set search_path to tiger_test: ERROR:  
schema tiger_test does not exist
pg_restore: [archiver (db)] could not execute query: ERROR:  type 
public.geometry does not exist
Command was: CREATE TABLE bg01_d00 (
ogc_fid integer,
wkb_geometry public.geometry,
area numeric(20,5),
perimeter numeric...
pg_restore: [archiver (db)] could not execute query: ERROR:  schema 
tiger_test does not exist
Command was: ALTER TABLE tiger_test.bg01_d00 OWNER TO postgres;
pg_restore: [archiver (db)] Error from TOC entry 21178; 1259 1135497928 TABLE 
bg02_d00 postgres
pg_restore: [archiver (db)] could not execute query: ERROR:  type 
public.geometry does not exist
Command was: CREATE TABLE bg02_d00 (
ogc_fid integer,
wkb_geometry public.geometry,
area numeric(20,5),
perimeter numeric...
pg_restore: [archiver (db)] could not execute query: ERROR:  schema 
tiger_test does not exist
Command was: ALTER TABLE tiger_test.bg02_d00 OWNER TO postgres;
pg_restore: [archiver (db)] Error from TOC entry 21609; 0 1135494072 TABLE DATA 
bg01_d00 postgres
pg_restore: [archiver (db)] could not execute query: ERROR:  relation 
bg01_d00 does not exist
Command was: COPY bg01_d00 (ogc_fid, wkb_geometry, area, perimeter, 
bg01_d00_, bg01_d00_i, state, county, tract, blkgroup, name, lsad, ls...
pg_restore: [archiver (db)] Error from TOC entry 21610; 0 1135497928 TABLE DATA 
bg02_d00 postgres
pg_restore: [archiver (db)] could not execute query: ERROR:  syntax error at or 
near 1 at character 1
Command was: 1  
01030001007D00F1283A3752FB55C0E38C825CB98041403BC3D4963AFB55C0D8D30E7F4D80414015376E313FFB55C07AE40F069E7F4140...
WARNING: errors ignored on restore: 9


As you can see, it's treating the data (the 0103 bit) as a
command, which is most certainly not right, especially when it *knows*
that the COPY command failed.

Attempting to restore from a dump with 2 table objects with patch:


[EMAIL PROTECTED]:/data/sfrost/postgres/testinstall bin/pg_restore -d tsf -Fc 
-h localhost ../tiger_test.dump
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 412; 16672 1135494071 SCHEMA 
tiger_test postgres
pg_restore: [archiver (db)] could not execute query: ERROR:  role postgres 
does not exist
Command was: ALTER SCHEMA tiger_test OWNER TO postgres;
pg_restore: [archiver (db)] Error from TOC entry 21177; 1259 1135494072 TABLE 
bg01_d00 postgres
pg_restore: [archiver (db)] could not execute query: ERROR:  type 
public.geometry does not exist
Command was: CREATE TABLE bg01_d00 (
ogc_fid integer,
wkb_geometry public.geometry,
area numeric(20,5),
perimeter numeric...
pg_restore: [archiver (db)] could not execute query: ERROR:  relation 
tiger_test.bg01_d00 does not exist
Command was: ALTER TABLE tiger_test.bg01_d00 OWNER TO postgres;
pg_restore: [archiver (db)] Error from TOC entry 21178; 1259 1135497928 TABLE 
bg02_d00 postgres
pg_restore: [archiver (db)] could not execute query: ERROR:  type 
public.geometry does not exist
Command was: CREATE TABLE bg02_d00 (
ogc_fid integer,
wkb_geometry public.geometry,
area numeric(20,5),
perimeter numeric...
pg_restore: [archiver (db)] could not execute query: ERROR:  relation 
tiger_test.bg02_d00 does not exist
Command was: ALTER TABLE tiger_test.bg02_d00 OWNER TO postgres;
pg_restore: [archiver (db)] Error from TOC entry 21609; 0 1135494072 TABLE DATA 
bg01_d00 postgres
pg_restore: [archiver (db)] could not execute query: ERROR:  relation 
bg01_d00 does not exist
Command was: COPY bg01_d00 (ogc_fid, wkb_geometry, area, perimeter

Re: [PATCHES] pg_restore COPY error handling

2006-01-20 Thread Stephen Frost
* Stephen Frost ([EMAIL PROTECTED]) wrote:
 * Tom Lane ([EMAIL PROTECTED]) wrote:
  Stephen Frost [EMAIL PROTECTED] writes:
 It seems like pg_restore really should be able to handle COPY errors
 correctly by skipping to the end of the COPY data segment when the
 initial COPY command comes back as an error.
  
  Send a patch ;-)
 
 This is what I get for knowing how to copy  paste C code, eh? ;-)
 
 Attached is a patch to pg_restore, against HEAD but I think it'd work
 against 8.1 just fine, to better handle it when a COPY command fails
 (for whatever reason) during a DB restore.
[...]
 Command was:
 
 COPY bg02_d00 (ogc_fid, wkb_geometry, area, perimeter, bg02_d00_, bg02_d00_i, 
 state, county, tract, blkgroup, name, lsad, ...
 WARNING: errors ignored on restore: 7
 

Of course, looking at this again, I'm afraid my COPY-attempt-detection
logic isn't quite enough.  I was hoping it'd be taken care of by
_sendSQLLine, but apparently not.

The line:
if (strncasecmp(qry-data,COPY ,5) == 0) AH-pgCopyIn = -1;

Needs to be changed to handle whitespace in front of the actual 'COPY',
unless someone else has a better idea.  This should be reasonably
trivial to do though...  If you'd like me to make that change and send
in a new patch, just let me know.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] TRUNCATE, VACUUM, ANALYZE privileges

2006-01-04 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
The following patch implements individual privileges for TRUNCATE,
VACUUM and ANALYZE.  Includes documentation and regression test
updates.  Resolves TODO item 'Add a separate TRUNCATE permission'.
 
At least the 'no one interested has written a patch' argument is gone
now, fire away with other comments/concerns. :)
 
 I have a very serious problem with the idea of inventing individual
 privilege bits for every maintenance command in sight.  That does not
 scale.  How will you handle GRANT ADD COLUMN, or GRANT ADD COLUMN
 as-long-as-its-not-SERIAL-because-I-dont-want-you-creating-sequences,
 or GRANT ALTER TABLE RELIABILITY as soon as someone writes that patch,
 or a dozen other cases that I could name without stopping for breath?

GRANT ADD COLUMN, etc, aren't maintenance commands, they're DDL
statements and as such should be the purview of the owner.  TRUNCATE,
VACUUM and ANALYZE are DML commands and are commands a user of
the table would use through the normal course of inserting, updating or
deleteing data in the table.

 The proposed patch eats three of the five available privilege bits (that
 is, available without accepting the distributed cost of enlarging ACL
 bitmasks), and you've made no case at all why we should spend that
 limited resource in this particular fashion.

I've shown a specific use-case for this.  It's been asked for before by
others.  I've shown why these particular ones make sense (while 'ADD
COLUMN', etc, don't).  If we come up with more Postgres-specific DML
statements which aren't covered by other grants (which doesn't seem
terribly likely at this point) then we should add those.  I could see
making VACUUM and ANALYZE use the same bit (since one implies the other)
but I'm not really a big fan of that and I don't see any other need for
these bits coming down the line anytime soon.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] TRUNCATE, VACUUM, ANALYZE privileges

2006-01-04 Thread Stephen Frost
* daveg ([EMAIL PROTECTED]) wrote:
 We rely heavily on truncate as delete for large numbers of rows is very
 costly. An example, we copy_in batches  of rows from several sources through
 the day to a pending work table, with another process periodically
 processing the rows and sweeping them into a history table. The sweep
 leaves an empty pending work table.  Truncate is very efficient for this
 pattern.
 
 However it means that all our jobs have to run with more permissions than
 they really should have as there is no way to grant truncate. If giving
 truncate its very own permission is too wasteful of permission bits, perhaps
 having truncate be the same as delete for permissions purposes would work.

Sounds very similar to my use-case, except my users just have to suffer
with delete because I don't want to grant them additional permissions.
Having truncate act off of delete isn't actually an option
unfortunately.  This is because truncate skips triggers (probably not an
issue for you, certainly not one for me, but a problem with doing it in
the general case).

I'm not sure about you, but I know that I'd like to be able to do:
TRUNCATE, insert/copy data, ANALYZE without having to give all the other
permissions associated with ownership.

 Alternatively a separate whole table operations permision might cover
 truncate and some of the alter type things too. Of course table owner does
 this, but that is what I don't want everyone to be require to have.

I'm not entirely sure if that'd be better or not..  It would involve
changing the structure of the ACLs to have two sets for each relation
and you'd have to sometimes look at one, sometimes at the other, and
possible both in some cases...

Thanks,

Stephen


signature.asc
Description: Digital signature


[PATCHES] TRUNCATE, VACUUM, ANALYZE privileges

2006-01-03 Thread Stephen Frost
Greetings,

  The following patch implements individual privileges for TRUNCATE,
  VACUUM and ANALYZE.  Includes documentation and regression test
  updates.  Resolves TODO item 'Add a separate TRUNCATE permission'.
  Created off of current (2005/01/03) CVS TIP.

  At least the 'no one interested has written a patch' argument is gone
  now, fire away with other comments/concerns. :)

Thanks,

Stephen
Index: doc/src/sgml/func.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.301
diff -c -r1.301 func.sgml
*** doc/src/sgml/func.sgml  28 Dec 2005 01:29:58 -  1.301
--- doc/src/sgml/func.sgml  4 Jan 2006 03:38:01 -
***
*** 8961,8968 
  The desired access privilege type
  is specified by a text string, which must evaluate to one of the
  values literalSELECT/literal, literalINSERT/literal, 
literalUPDATE/literal,
! literalDELETE/literal, literalRULE/literal, 
literalREFERENCES/literal, or
! literalTRIGGER/literal.  (Case of the string is not significant, 
however.)
  An example is:
  programlisting
  SELECT has_table_privilege('myschema.mytable', 'select');
--- 8961,8969 
  The desired access privilege type
  is specified by a text string, which must evaluate to one of the
  values literalSELECT/literal, literalINSERT/literal, 
literalUPDATE/literal,
! literalDELETE/literal, literalRULE/literal, 
literalREFERENCES/literal,
! literalTRIGGER/literal, literalTRUNCATE/literal, 
literalVACUUM/literal, or
! literalANALYZE/literal.  (Case of the string is not significant, 
however.)
  An example is:
  programlisting
  SELECT has_table_privilege('myschema.mytable', 'select');
Index: doc/src/sgml/information_schema.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/information_schema.sgml,v
retrieving revision 1.23
diff -c -r1.23 information_schema.sgml
*** doc/src/sgml/information_schema.sgml8 Dec 2005 20:48:10 -   
1.23
--- doc/src/sgml/information_schema.sgml4 Jan 2006 03:38:01 -
***
*** 2395,2401 
 Type of the privilege: literalSELECT/literal,
 literalDELETE/literal, literalINSERT/literal,
 literalUPDATE/literal, literalREFERENCES/literal,
!literalRULE/literal, or literalTRIGGER/literal
/entry
   /row
  
--- 2395,2403 
 Type of the privilege: literalSELECT/literal,
 literalDELETE/literal, literalINSERT/literal,
 literalUPDATE/literal, literalREFERENCES/literal,
!literalRULE/literal, literalTRIGGER/literal,
!literalTRUNCATE/literal, literalVACUUM/literal, or
!literalANALYZE/literal.
/entry
   /row
  
***
*** 3643,3649 
 Type of the privilege: literalSELECT/literal,
 literalDELETE/literal, literalINSERT/literal,
 literalUPDATE/literal, literalREFERENCES/literal,
!literalRULE/literal, or literalTRIGGER/literal
/entry
   /row
  
--- 3645,3653 
 Type of the privilege: literalSELECT/literal,
 literalDELETE/literal, literalINSERT/literal,
 literalUPDATE/literal, literalREFERENCES/literal,
!literalRULE/literal, literalTRIGGER/literal,
!literalTRUNCATE/literal, literalVACUUM/literal, or
!literalANALYZE/literal.
/entry
   /row
  
Index: doc/src/sgml/user-manag.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/user-manag.sgml,v
retrieving revision 1.33
diff -c -r1.33 user-manag.sgml
*** doc/src/sgml/user-manag.sgml20 Oct 2005 19:18:00 -  1.33
--- doc/src/sgml/user-manag.sgml4 Jan 2006 03:38:01 -
***
*** 296,301 
--- 296,302 
 There are several different kinds of privilege: literalSELECT/,
 literalINSERT/, literalUPDATE/, literalDELETE/,
 literalRULE/, literalREFERENCES/, literalTRIGGER/,
+literalTRUNCATE/, literalVACUUM/, literalANALYZE/,
 literalCREATE/, literalTEMPORARY/, literalEXECUTE/,
 and literalUSAGE/. For more
 information on the different types of privileges supported by
Index: doc/src/sgml/ref/grant.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/ref/grant.sgml,v
retrieving revision 1.50
diff -c -r1.50 grant.sgml
*** doc/src/sgml/ref/grant.sgml 20 Oct 2005 19:18:01 -  1.50
--- doc/src/sgml/ref/grant.sgml 4 Jan 2006 03:38:02 -
***
*** 20,26 
  
   refsynopsisdiv
  synopsis
! GRANT { { SELECT | INSERT | UPDATE | DELETE | RULE | REFERENCES | TRIGGER }
  [,...] | ALL [ PRIVILEGES ] }
  ON [ TABLE ] replaceable class=PARAMETERtablename/replaceable [, 
...]
  TO { replaceable class=PARAMETERusername/replaceable | 

Re: [PATCHES] Roles - SET ROLE Updated

2005-07-21 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 BTW, I realized we do not support granting roles to PUBLIC:
 
 regression=# create role r;
 CREATE ROLE
 regression=# grant r to public;
 ERROR:  role public does not exist
 
 but as far as I can tell SQL99 expects this to work.

Indeed, I believe you're correct, sorry about missing that.

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] Roles - SET ROLE Updated

2005-07-21 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Another issue: I like the has_role() function and in fact think it needs
 to come in multiple variants just like has_table_privilege and friends:
 
   has_role(name, name)
   has_role(name, oid)
   has_role(oid, name)
   has_role(oid, oid)
   has_role(name)  -- implicitly has_role(current_user, ...)
   has_role(oid)
 
 However I'm a bit dubious about whether has_role isn't an invasion of
 application namespace.  pg_has_role would be better, but we have the
 (mis) precedent of has_table_privilege.  What do you think about calling
 it has_role_privilege?

I thought about that originally.  It seemed a bit long to me and I felt
that having the 'privilege' of a role wasn't quite the same as having a
'role', but honestly I'm not terribly picky and on reflection a role
*is* like other objects in the catalog (I originally hadn't considered
it such), so, that's fine with me...

has_role() was another reason I was thinking about having a seperate
function for 'is_member_of_role' which didn't pollute the cache, just a
side-note.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] Change Ownership Permission Checks

2005-07-15 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
Attached please find a patch to change how the permissions checking
for alter-owner is done.  With roles there can be more than one
'owner' of an object and therefore it becomes sensible to allow
specific cases of ownership change for non-superusers.
 
 Applied with minor revisions.  The patch as submitted suffered a certain
 amount of copy-and-paste-itis (eg, trying to use pg_type_ownercheck on
 an opclass), and I really disliked using ACLCHECK_NOT_OWNER as the way
 to report you can't assign ownership to that role because you are not
 a member of it.  So I made a separate error message for that case.

Great, thanks!  Sorry about the copy-and-paste-itis...  Must have been a
case I wasn't sure about.  The different error message makes perfect
sense.  I see you also did the superuser-in-every-role change that I had
included, thanks.

When writing this patch it occurred to me that we nuke our
member-of-role cache for one-off lookups on occation.  I don't
particularly like that, especially when we *know* it's a one-off lookup,
so I was considering adding a function for the one-off lookup case but
I couldn't come up with a way to avoid a fair bit of mostly-the-same
code as the current cache-regen code, without making the cache-regen
code alot slower which would negate the point.

Just some thoughts.

Thanks again,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] Disable page writes when fsync off, add GUC

2005-07-04 Thread Stephen Frost
* Bruce Momjian (pgman@candle.pha.pa.us) wrote:
 This patch disables page writes to WAL when fsync is off, because with
 no fsync guarantee, the page write recovery isn't useful.

This doesn't seem quite right to me.  What happens with PITR?  And
Postgres crashes?  While many people seriously distrust running w/ fsync
off, I'm sure there's quite a few folks which do.

 This also adds a full_page_writes GUC to turn off page writes to WAL. 
 Some people might not want full_page_writes, but still might want fsync.

Adding an option to not do page writes to WAL seems fine to me, but I
think WAL writes should be on by default, even in the fsync=off case.
If people want to turn it off, fine, for either case since we expect
they understand what it means to have it turned off, but I don't think
the two options should be coupled as is being proposed.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] per user/database connections limit again

2005-07-02 Thread Stephen Frost
* Petr Jelinek ([EMAIL PROTECTED]) wrote:
 + if (!(superuser()
 + || ((Form_pg_database) GETSTRUCT(tuple))-datdba == 
 GetUserId()))
 + aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_DATABASE,
 +stmt-dbname);

This should almost certainly be a pg_database_ownercheck() call instead.

The rest needs to be updated for roles, but looks like it should be 
pretty easy to do.  Much of it just needs to be repatched, the parts 
that do need to be changed look to be pretty simple changes.

I believe the use of SessionUserId is probably correct in this patch.
This does mean that this patch will only be for canlogin roles, but that
seems like it's probably correct.  Handling roles w/ members would
require much more thought.

Thanks,

Stephen


signature.asc
Description: Digital signature


[PATCHES] Roles - SET ROLE

2005-07-01 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  Tom, if you're watching, are you working on this?  I can probably spend
  some time today on it, if that'd be helpful.
 
 I am not; I was hoping you'd deal with SET ROLE.  Is it really much
 different from SET SESSION AUTHORIZATION?

Here's what I've got done so far on SET ROLE.  I wasn't able to get as
much done as I'd hoped to.  This is mostly just to get something posted
before the end of today in case some might think it's more of a feature
than a bug needing to be fixed (which is what I'd consider it,
personally).  I'll try and work on it some this weekend, but in the US
it's a holiday weekend and I'm pretty busy. :/

Thanks,

Stephen
? src/backend/parser/.gram.y.swp
Index: src/backend/commands/variable.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/commands/variable.c,v
retrieving revision 1.109
diff -c -r1.109 variable.c
*** src/backend/commands/variable.c 28 Jun 2005 05:08:55 -  1.109
--- src/backend/commands/variable.c 1 Jul 2005 21:34:16 -
***
*** 564,569 
--- 564,680 
  
  
  /*
+  * SET ROLE
+  *
+  * When resetting session auth after an error, we can't expect to do catalog
+  * lookups.  Hence, the stored form of the value must provide a numeric oid
+  * that can be re-used directly.  We store the string in the form of
+  * NAMEDATALEN 'x's, followed by T or F to indicate superuserness, followed
+  * by the numeric oid, followed by a comma, followed by the role name.
+  * This cannot be confused with a plain role name because of the NAMEDATALEN
+  * limit on names, so we can tell whether we're being passed an initial
+  * role name or a saved/restored value.
+  */
+ extern char *role_string; /* in guc.c */
+ 
+ const char *
+ assign_role(const char *value, bool doit, GucSource source)
+ {
+   Oid roleid = InvalidOid;
+   boolis_superuser = false;
+   const char *actual_rolename = NULL;
+   char   *result;
+ 
+   if (strspn(value, x) == NAMEDATALEN 
+   (value[NAMEDATALEN] == 'T' || value[NAMEDATALEN] == 'F'))
+   {
+   /* might be a saved userid string */
+   Oid savedoid;
+   char   *endptr;
+ 
+   savedoid = (Oid) strtoul(value + NAMEDATALEN + 1, endptr, 10);
+ 
+   if (endptr != value + NAMEDATALEN + 1  *endptr == ',')
+   {
+   /* syntactically valid, so break out the data */
+   roleid = savedoid;
+   is_superuser = (value[NAMEDATALEN] == 'T');
+   actual_rolename = endptr + 1;
+   }
+   }
+ 
+   if (roleid == InvalidOid)
+   {
+   /* not a saved ID, so look it up */
+   HeapTuple   roleTup;
+ 
+   if (!IsTransactionState())
+   {
+   /*
+* Can't do catalog lookups, so fail.  The upshot of 
this is
+* that session_authorization cannot be set in
+* postgresql.conf, which seems like a good thing 
anyway.
+*/
+   return NULL;
+   }
+ 
+   roleTup = SearchSysCache(AUTHNAME,
+
PointerGetDatum(value),
+0, 0, 0);
+   if (!HeapTupleIsValid(roleTup))
+   {
+   if (source = PGC_S_INTERACTIVE)
+   ereport(ERROR,
+   
(errcode(ERRCODE_UNDEFINED_OBJECT),
+errmsg(role \%s\ does not 
exist, value)));
+   return NULL;
+   }
+ 
+   roleid = HeapTupleGetOid(roleTup);
+   is_superuser = ((Form_pg_authid) GETSTRUCT(roleTup))-rolsuper;
+   actual_rolename = value;
+ 
+   ReleaseSysCache(roleTup);
+   }
+ 
+   if (doit)
+   SetRole(roleid);
+ 
+   result = (char *) malloc(NAMEDATALEN + 32 + strlen(actual_rolename));
+   if (!result)
+   return NULL;
+ 
+   memset(result, 'x', NAMEDATALEN);
+ 
+   sprintf(result + NAMEDATALEN, %c%u,%s,
+   is_superuser ? 'T' : 'F',
+   roleid,
+   actual_rolename);
+ 
+   return result;
+ }
+ 
+ const char *
+ show_role(void)
+ {
+   /*
+* Extract the role name from the stored string; see
+* assign_role
+*/
+   const char *value = role_string;
+   Oid savedoid;
+   char   *endptr;
+ 
+   Assert(strspn(value, x) == NAMEDATALEN 
+  (value

[PATCHES] Change Ownership Permission Checks

2005-06-29 Thread Stephen Frost
Greetings,

  Attached please find a patch to change how the permissions checking
  for alter-owner is done.  With roles there can be more than one
  'owner' of an object and therefore it becomes sensible to allow
  specific cases of ownership change for non-superusers.

  The permission checks for change-owner follow the alter-rename
  precedent that the new owner must have permission to create the object
  in the schema.

  The roles patch previously applied did not require the role for 
  which a database is being created to have createdb privileges, or for
  the role for which a schema is being created to have create
  privileges on the database (the role doing the creation did have to
  have those privileges though, of course).

  For 'container' type objects this seems reasonable.  'container' type
  objects are unlike others in a few ways, but one of the more notable
  differences for this case is that an owner may be specified as part of
  the create command.

  To support cleaning up the various checks, I also went ahead and
  modified is_member_of_role() to always return true when asked if a
  superuser is in a given role.  This seems reasonable, won't affect
  what's actually seen in the various tables, and allows us to eliminate
  explicit superuser() checks in a number of places.

  I have also reviewed the other superuser() calls in
  src/backend/commands/ and feel pretty comfortable that they're all
  necessary, reasonable, and don't need to be replaced with 
  *_ownercheck or other calls.

  The specific changes which have been changed, by file:
  aggregatecmds.c, alter-owner:
alter-owner checks:
  User is owner of the to-be-changed object
  User is a member of the new owner's role
  New owner is permitted to create objects in the schema
  Superuser() requirement removed

  conversioncmds.c, rename:
rename-checks:
  Changed from superuser() or same-roleId to pg_conversion_ownercheck
alter-owner checks:
  User is owner of the to-be-changed object
  User is a member of the new owner's role
  New owner is permitted to create objects in the schema
  Superuser() requirement removed

  dbcommands.c:
Moved superuser() check to have_createdb_privilege
Cleaned up permissions checking in createdb and rename
alter-owner checks:
  User is owner of the database
  User is a member of the new owner's role
  User has createdb privilege

  functioncmds.c:
alter-owner checks:
  User is owner of the function
  User is a member of the new owner's role
  New owner is permitted to create objects in the schema

  opclasscmds.c:
alter-owner checks:
  User is owner of the object
  User is a member of the new owner's role
  New owner has permission to create objects in the schema

  operatorcmds.c:
alter-owner checks:
  User is owner of the object
  User is a member of the new owner's role
  New owner has permission to create objects in the schema

  schemacmds.c:
Cleaned up create schema identify changing/setting/checking
(This code was quite different from all the other create functions,
 these changes make it much more closely match createdb)
alter-owner checks:
  User is owner of the schema
  User is a member of the new owner's role
  User has create privilege on database

  tablecmds.c:
alter-owner checks:
  User is owner of the object
  User is a member of the new owner's role
  New owner has permission to create objects in the schema

  tablespace.c:
alter-owner checks:
  User is owner of the tablespace
  User is a member of the new owner's role
  (No create-tablespace permission to check, tablespaces must be
   created by superusers and so alter-owner here really only matters
   if the superuser changed the tablespace owner to a non-superuser
   and then that non-superuser wants to change the ownership to yet
   another user, the other option would be to continue to force
   superuser-only for tablespace owner changes but I'm not sure I
   see the point if the superuser trusts the non-superuser enough to
   give them a tablespace...)

  typecmds.c:
alter-owner checks:
  User is owner of the object
  User is a member of the new owner's role
  New owner has permission to create objects in the schema

  Many thanks.  As always, comments, questions, concerns, please let me
  know.

Thanks again,

Stephen
? src/backend/commands/.typecmds.c.swp
Index: src/backend/commands/aggregatecmds.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/commands/aggregatecmds.c,v
retrieving revision 1.27
diff -c -r1.27 aggregatecmds.c
*** src/backend/commands/aggregatecmds.c28 Jun 2005 05:08:53 -  
1.27
--- src/backend/commands/aggregatecmds.c29 Jun 2005 15:29:57 -
***
*** 299,307 

Re: [PATCHES] Grammer Cleanup

2005-06-10 Thread Stephen Frost
* Bruce Momjian (pgman@candle.pha.pa.us) wrote:
 I am removing this patch from the queue because without the role
 feature I don't think it makes sense to rename the grammer tokens.

Sure.  I've rolled this patch into my role tree so these changes will be
included when the CREATE ROLE, etc is introduced.

Thanks,

Stephen

 Stephen Frost wrote:
  * Tom Lane ([EMAIL PROTECTED]) wrote:
   Stephen Frost [EMAIL PROTECTED] writes:
Ok, should I change SchemaName  SavePointId back to ColId,
   
   I'd just leave them as ColId.  I don't think much would be gained by
   introducing those productions.
  
  Done, here's the patch.
  
  Thanks,
  
  Stephen
 
 [ Attachment, skipping... ]
 
  
  ---(end of broadcast)---
  TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   pgman@candle.pha.pa.us   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073


signature.asc
Description: Digital signature


[PATCHES] Move get_grosysid() to utils/cache/lsyscache.c

2004-12-29 Thread Stephen Frost
Greetings,

  Small patch to move get_grosysid() from catalog/aclchk.c to 
  utils/cache/lsyscache.c where it can be used by other things.  Also
  cleans up both get_usesysid() and get_grosysid() a bit.  This is in
  preparation for 'Group Ownership' support.

Thanks,

Stephen
diff -u -u -r1.107 aclchk.c
--- src/backend/catalog/aclchk.c29 Aug 2004 05:06:41 -  1.107
+++ src/backend/catalog/aclchk.c29 Dec 2004 16:32:57 -
@@ -1208,28 +1208,6 @@
return NULL;/* appease compiler */
 }
 
-
-AclId
-get_grosysid(char *groname)
-{
-   HeapTuple   tuple;
-   AclId   id = 0;
-
-   tuple = SearchSysCache(GRONAME,
-  PointerGetDatum(groname),
-  0, 0, 0);
-   if (HeapTupleIsValid(tuple))
-   {
-   id = ((Form_pg_group) GETSTRUCT(tuple))-grosysid;
-   ReleaseSysCache(tuple);
-   }
-   else
-   ereport(ERROR,
-   (errcode(ERRCODE_UNDEFINED_OBJECT),
-errmsg(group \%s\ does not exist, 
groname)));
-   return id;
-}
-
 /*
  * Convert group ID to name, or return NULL if group can't be found
  */
diff -u -u -r1.118 lsyscache.c
--- src/backend/utils/cache/lsyscache.c 5 Nov 2004 19:16:14 -   1.118
+++ src/backend/utils/cache/lsyscache.c 29 Dec 2004 16:32:58 -
@@ -25,6 +25,7 @@
 #include catalog/pg_operator.h
 #include catalog/pg_proc.h
 #include catalog/pg_shadow.h
+#include catalog/pg_group.h
 #include catalog/pg_statistic.h
 #include catalog/pg_type.h
 #include nodes/makefuncs.h
@@ -2032,7 +2033,7 @@
 AclId
 get_usesysid(const char *username)
 {
-   int32   result;
+   AclId   userId;
HeapTuple   userTup;
 
userTup = SearchSysCache(SHADOWNAME,
@@ -2043,9 +2044,39 @@
(errcode(ERRCODE_UNDEFINED_OBJECT),
 errmsg(user \%s\ does not exist, 
username)));
 
-   result = ((Form_pg_shadow) GETSTRUCT(userTup))-usesysid;
+   userId = ((Form_pg_shadow) GETSTRUCT(userTup))-usesysid;
 
ReleaseSysCache(userTup);
 
-   return result;
+   return userId;
+}
+
+/*
+ * get_grosysid
+ *
+ *   Given a group name, look up the group's sysid.
+ *   Raises an error if no such group (rather than returning zero,
+ *   which might possibly be a valid grosysid).
+ *
+ */
+AclId
+get_grosysid(char *groname)
+{
+   AclId   groupId;
+   HeapTuple   groupTup;
+
+   groupTup = SearchSysCache(GRONAME,
+  PointerGetDatum(groname),
+  0, 0, 0);
+   if (!HeapTupleIsValid(groupTup))
+   ereport(ERROR,
+   (errcode(ERRCODE_UNDEFINED_OBJECT),
+errmsg(group \%s\ does not exist, 
groname)));
+
+   groupId = ((Form_pg_group) GETSTRUCT(groupTup))-grosysid;
+
+   ReleaseSysCache(groupTup);
+
+   return groupId;
 }
+
Index: src/include/utils/acl.h
===
RCS file: /projects/cvsroot/pgsql/src/include/utils/acl.h,v
retrieving revision 1.75
diff -u -u -r1.75 acl.h
--- src/include/utils/acl.h 29 Aug 2004 05:06:58 -  1.75
+++ src/include/utils/acl.h 29 Dec 2004 16:32:58 -
@@ -245,7 +245,6 @@
  * prototypes for functions in aclchk.c
  */
 extern void ExecuteGrantStmt(GrantStmt *stmt);
-extern AclId get_grosysid(char *groname);
 extern char *get_groname(AclId grosysid);
 
 extern AclMode pg_class_aclmask(Oid table_oid, AclId userid,
Index: src/include/utils/lsyscache.h
===
RCS file: /projects/cvsroot/pgsql/src/include/utils/lsyscache.h,v
retrieving revision 1.92
diff -u -u -r1.92 lsyscache.h
--- src/include/utils/lsyscache.h   5 Nov 2004 19:16:41 -   1.92
+++ src/include/utils/lsyscache.h   29 Dec 2004 16:32:58 -
@@ -115,7 +115,8 @@
  Datum *values, int nvalues,
  float4 *numbers, int nnumbers);
 extern char *get_namespace_name(Oid nspid);
-extern int32 get_usesysid(const char *username);
+extern AclId get_usesysid(const char *username);
+extern AclId get_grosysid(char *groname);
 
 #define is_array_type(typid)  (get_element_type(typid) != InvalidOid)
 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PATCHES] Grammer Cleanup

2004-12-29 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
Small patch to clean up the grammer a bit by adding 'GroupId',
'SchemaName' and 'SavePointId'.
 
 I don't particularly see the value of this --- especially since the
 direction of future development is likely to be to remove the
 distinction between user and group names, rather than make it stronger.

Alright, I can redo it w/ UserId in place of GroupId everywhere.  I'd
like your comment on the 'pg_class changes for group ownership' on
-hackers, which might play into these changes too.  Previously 'GRANT' 
had 'ColId' instead of either, which isn't really appropriate there...
Do you agree with the other changes (ColId - SchemaName, ColId - 
SavePointId) ?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] Grammer Cleanup

2004-12-29 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  Do you agree with the other changes (ColId - SchemaName, ColId -=20
  SavePointId) ?
 
 I don't really see the value of them.  They add some marginal
 documentation I suppose, but they also make the grammar bigger and
 slower.  A more substantial objection to the practice is that it can
 introduce needless shift/reduce conflicts, by forcing the parser into
 making unnecessary decisions before it has enough context to
 determine what kind of name a particular token really is.

Perhaps the name of 'ColId' should be changed then.  At least to me that
comes across as 'Column Identifier', and apparently some others thought
it wasn't appropriate for user names (UserId existed and was mapped to 
ColId prior to my patch).  Personally, I'd just like it to be
consistent, when I was looking at how to add the grammar for group
ownership group names were identified in one place as 'ColId' and another
as 'UserId', iirc.

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] Grammer Cleanup

2004-12-29 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Given other discussion, it might be best to rename it to RoleId and use
 that for both users and groups.

Ok, should I change SchemaName  SavePointId back to ColId, leave them
as in the patch, change them to RoleId, or something else?  Neither
ColId nor RoleId seems appropriate for those..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] Grammer Cleanup

2004-12-29 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  Ok, should I change SchemaName  SavePointId back to ColId,
 
 I'd just leave them as ColId.  I don't think much would be gained by
 introducing those productions.

Done, here's the patch.

Thanks,

Stephen
diff -u -u -r2.480 gram.y
--- src/backend/parser/gram.y   8 Nov 2004 04:02:20 -   2.480
+++ src/backend/parser/gram.y   29 Dec 2004 19:04:29 -
@@ -304,7 +304,7 @@
 
 %type ival   Iconst
 %type strSconst comment_text
-%type strUserId opt_boolean ColId_or_Sconst
+%type strRoleId opt_boolean ColId_or_Sconst
 %type list   var_list var_list_or_default
 %type strColId ColLabel var_name type_name param_name
 %type node   var_value zone_value
@@ -570,7 +570,7 @@
  */
 
 CreateUserStmt:
-   CREATE USER UserId opt_with OptUserList
+   CREATE USER RoleId opt_with OptUserList
{
CreateUserStmt *n = 
makeNode(CreateUserStmt);
n-user = $3;
@@ -592,7 +592,7 @@
  */
 
 AlterUserStmt:
-   ALTER USER UserId opt_with OptUserList
+   ALTER USER RoleId opt_with OptUserList
 {
AlterUserStmt *n = 
makeNode(AlterUserStmt);
n-user = $3;
@@ -603,7 +603,7 @@
 
 
 AlterUserSetStmt:
-   ALTER USER UserId SET set_rest
+   ALTER USER RoleId SET set_rest
{
AlterUserSetStmt *n = 
makeNode(AlterUserSetStmt);
n-user = $3;
@@ -611,7 +611,7 @@
n-value = $5-args;
$$ = (Node *)n;
}
-   | ALTER USER UserId VariableResetStmt
+   | ALTER USER RoleId VariableResetStmt
{
AlterUserSetStmt *n = 
makeNode(AlterUserSetStmt);
n-user = $3;
@@ -691,8 +691,8 @@
}
;
 
-user_list: user_list ',' UserId{ $$ = lappend($1, 
makeString($3)); }
-   | UserId{ $$ = 
list_make1(makeString($1)); }
+user_list: user_list ',' RoleId{ $$ = lappend($1, 
makeString($3)); }
+   | RoleId{ $$ = 
list_make1(makeString($1)); }
;
 
 
@@ -705,7 +705,7 @@
  */
 
 CreateGroupStmt:
-   CREATE GROUP_P UserId opt_with OptGroupList
+   CREATE GROUP_P RoleId opt_with OptGroupList
{
CreateGroupStmt *n = 
makeNode(CreateGroupStmt);
n-name = $3;
@@ -742,7 +742,7 @@
  */
 
 AlterGroupStmt:
-   ALTER GROUP_P UserId add_drop USER user_list
+   ALTER GROUP_P RoleId add_drop USER user_list
{
AlterGroupStmt *n = 
makeNode(AlterGroupStmt);
n-name = $3;
@@ -765,7 +765,7 @@
  */
 
 DropGroupStmt:
-   DROP GROUP_P UserId
+   DROP GROUP_P RoleId
{
DropGroupStmt *n = 
makeNode(DropGroupStmt);
n-name = $3;
@@ -781,7 +781,7 @@
  */
 
 CreateSchemaStmt:
-   CREATE SCHEMA OptSchemaName AUTHORIZATION UserId 
OptSchemaEltList
+   CREATE SCHEMA OptSchemaName AUTHORIZATION RoleId 
OptSchemaEltList
{
CreateSchemaStmt *n = 
makeNode(CreateSchemaStmt);
/* One can omit the schema name or the 
authorization id. */
@@ -1300,8 +1300,8 @@
 
 /* Subcommands that are for ALTER TABLE or ALTER INDEX */
 alter_rel_cmd:
-   /* ALTER [TABLE|INDEX] name OWNER TO UserId */
-   OWNER TO UserId
+   /* ALTER [TABLE|INDEX] name OWNER TO RoleId

Re: [PATCHES] Add error-checking to timestamp_recv

2004-06-03 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 I said:
  I'll make a note to do something with this issue after the TZ patch
  is in.
 
 I've applied a patch to take care of this problem.

Great, thanks, much appriciated.  I'll test once 7.5 goes beta.

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] Add error-checking to timestamp_recv

2004-05-20 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  * Bruce Momjian ([EMAIL PROTECTED]) wrote:
  Would you show an example of the invalid value this is trying to avoid?
 
  Well, the way I discovered the problem was by sending a timestamp in
  double format when the server was expecting one in int64 format.
 
 Most of the time, though, this sort of error would still yield a valid
 value that just failed to represent the timestamp value you wanted.
 I'm unsure that a range check is going to help much.

I'm not sure I agree about the 'most of the time' comment- I havn't done
extensive tests yet but I wouldn't be at all suprised if most of the
time range around the current date, when stored as a double and passed
to the database which is expecting an int64, would cause the problem.

The issue isn't about the wrong date being shown or anything, it's that
the database accepts the value but then throws errors whenever you try
to view the table.  The intent of the patch was to use the exact same
logic to test if the date is valid on the incoming side as is used to
test the date before displaying it.  This would hopefully make it
impossible for someone to run into this problem in the future.  It would
also make it much clearer to the programmer that the date he's passing
is bad and not that there's some corruption happening in the database
after the date is accepted.

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] Add error-checking to timestamp_recv

2004-05-20 Thread Stephen Frost
* Bruce Momjian ([EMAIL PROTECTED]) wrote:
 Stephen Frost wrote:
 -- Start of PGP signed section.
  * Bruce Momjian ([EMAIL PROTECTED]) wrote:
   Would you show an example of the invalid value this is trying to avoid?
  
  Well, the way I discovered the problem was by sending a timestamp in
  double format when the server was expecting one in int64 format.  This
  is when using the binary data method for timestamps.  I'll generate a
  small example program/schema later today and post it to the list.
 
 So you are passing the data via binary COPY or a C function?

Sorry I wasn't clear, it's using libpq and binary data using an 'insert'
statement.  The code looks something like this:

PQexec(connection,prepare addrow (timestamp) as insert into mytable
values ($1));
lengths[0] = sizeof(double);
formats[0] = 1;
*(double*)(values[0]) = tv_sec - ((POSTGRES_EPOCH_JDATE -
UNIX_EPOCH_DATE) * 86400) + (tv_sec / 100.00);
PQexecPrepared(connection,addrow,1,(void*)values,lengths,formats,0);

While the new code is something like:

int64_t pg_timestamp;
PQexec(connection,prepare addrow (timestamp) as insert into mytable
values ($1));
lengths[0] = sizeof(int64_t);
formats[0] = 1;
pg_timestamp = ((tv_sec - ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
86400)) * (int64_t)100) + tv_usec;
*(int64_t*)(values[0]) = bswap_64(pg_timestamp);
PQexecPrepared(connection,addrow,1,(void*)values,lengths,formats,0);

I'll see about writing up a proper test case/schema.  Looks like I'm
probably most of the way there at this point, really. ;)

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] Add error-checking to timestamp_recv

2004-05-20 Thread Stephen Frost
* Bruce Momjian ([EMAIL PROTECTED]) wrote:
 Stephen Frost wrote:
  I'll see about writing up a proper test case/schema.  Looks like I'm
  probably most of the way there at this point, really. ;)
 
 I wasn't aware you could throw binary values into the timestamp fields
 like that.  I thought you needed to use a C string for the value.
 
 Does PREPARE bypass that for some reason?

I don't think so..  As I recall, I think I might have had it set up w/o
using a prepare before and it was working but I'm not sure.  It's
certainly very useful and lets me bypass *alot* of overhead
(converting to a string and then making the database convert back...).
The one complaint I do have is that I don't see a way to pass a
timestamp w/ an explicit timezone in binary format into a table which
has a 'timestamp with timezone' field.  I can pass a binary timestamp
into a 'timestamp with timezone' field, but it's interpreted as UTC or
the local timezone (can't remember which atm).

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] Add error-checking to timestamp_recv

2004-05-20 Thread Stephen Frost
* Bruce Momjian ([EMAIL PROTECTED]) wrote:
 Considering all the other things the database is doing, I can't imagine
 that would be a measurable improvement.

It makes it easier on my client program too which is listening to an
ethernet interface and trying to process all of the packets coming in
off of it and putting timestamps and header information into the
database.  The table in the database doesn't have any constraints or
primary keys on it or anything, pretty much as simple as I could make
it. :)

  The one complaint I do have is that I don't see a way to pass a
  timestamp w/ an explicit timezone in binary format into a table which
  has a 'timestamp with timezone' field.  I can pass a binary timestamp
  into a 'timestamp with timezone' field, but it's interpreted as UTC or
  the local timezone (can't remember which atm).
 
 I still do not understand how this is working.  It must be using our
 fast path as part of prepare.  What language is you client code?

It's just plain ol' C.  It's a pretty short/simple program, really.  It
uses libpcap to listen to the interface, checks the type of packet
(ethernet, IP, UDP/TCP, etc), copies the binary header values into the
structure which it then passes to PQexecPrepared.  It's kind of amazing
under 2.6, you can actually calculate the delay and bandwidth pretty
accurately through a network (7 'backbone' nodes, each with a backbone
router, an edge router, and an access router, all in a lab) by listening
on two interfaces, one on each side to calculate one-way propagation
time.

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] Add error-checking to timestamp_recv

2004-05-20 Thread Stephen Frost
* Bruce Momjian ([EMAIL PROTECTED]) wrote:
 Tom Lane wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
   I wasn't aware you could throw binary values into the timestamp fields
   like that.  I thought you needed to use a C string for the value.
  
  This facility was added in 7.4 as part of the wire-protocol overhaul.
  It's nothing directly to do with PREPARE; you could get the same result
  with no prepared statement using PQexecParams.
 
 Ah, no wonder I had not seen that before.  So, I guess the issue is how
 much error checking do we want to have for these binary values.  I was a
 little disturbed to hear he could insert data he couldn't later view. 
 How many datatype have this issue?

I don't think that many do..  A number of them already check incoming
values where it's possible for them to not be valid.  For example,
'macaddr' accepts all possible binary values, 'inet' does error checking
on input.  Binary timestamps were the only place I found in the work I
was doing where this could happen and I managed to mess up most of the
fields in one way or another before I figured it all out. :)

Stephen


signature.asc
Description: Digital signature