Re: [PATCHES] pg_dump additional options for performance
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: I dislike, and doubt that I'd use, this approach. At the end of the day, it ends up processing the same (very large amount of data) multiple times. Well, that's easily avoided: just replace the third step by restoring directly to the target database. pg_restore --schema-before-data whole.dump before.sql edit before.sql pg_restore --schema-after-data whole.dump after.sql edit after.sql psql -f before.sql target_db pg_restore --data-only -d target_db whole.dump psql -f after.sql target_db This would depend on the dump being in the custom format, though I suppose that ends up being true for any usage of these options. I've never really been a fan of the custom format, in large part because it doesn't really buy you all that much and makes changing things more difficult (by having to extract out what you want to change, and then omit it from the restore). I can see some advantage to having the entire dump contained in a single file and still being able to pull out pieces based on before/after. Should we get a binary format which is much faster, I could see myself being more likely to use pg_restore. Same for parallelization or, in my fantasies, the ability to copy schema, tables, indexes, etc, in 'raw' PG format between servers. Worse than having to vi an insanely large file, or split it up to be able to modify the pieces you want, is having to rebuild indexes, especially GIST ones. That's another topic though. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_dump additional options for performance
* Tom Lane ([EMAIL PROTECTED]) wrote: Right, but the parallelization is going to happen sometime, and it is going to happen in the context of pg_restore. So I think it's pretty silly to argue that no one will ever want this feature to work in pg_restore. I think you've about convinced me on this, and it annoys me. ;) Worse is that it sounds like this might cause the options to not make it in for 8.4, which would be quite frustrating. To extend the example I just gave to Stephen, I think a fairly probable scenario is where you only need to tweak some before object definitions, and then you could do pg_restore --schema-before-data whole.dump before.sql edit before.sql psql -f before.sql target_db pg_restore --data-only --schema-after-data -d target_db whole.dump which (given a parallelizing pg_restore) would do all the time-consuming steps in a fully parallelized fashion. Alright, this has been mulling around in the back of my head a bit and has now finally surfaced- I like having the whole dump contained in a single file, but I hate having what ends up being out-dated or wrong or not what was loaded in the dump file. Doesn't seem likely to be possible, but it'd be neat to be able to modify objects in the dump file. Also, something which often happens to me is that I need to change the search_path or the role at the top of a .sql from pg_dump before restoring it. Seems like using the custom format would make that difficult without some pipe/cat/sed magic. Parallelization would make using that kind of magic more difficult too, I would guess. Might be something to think about. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_dump additional options for performance
* Joshua D. Drake ([EMAIL PROTECTED]) wrote: Custom format rocks for partial set restores from a whole dump. See the TOC option :) I imagine it does, but that's very rarely what I need. Most of the time we're dumping out a schema to load it into a seperate schema (usually on another host). Sometimes that can be done by simply vi'ing the file to change the search_path and whatnot, though more often we end up pipe'ing the whole thing through sed. Since we don't allow regular users to do much, and you have to 'set role postgres;' to do anything as superuser, we also often end up adding 'set role postgres;' to the top of the .sql files. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_dump additional options for performance
* Simon Riggs ([EMAIL PROTECTED]) wrote: The key capability here is being able to split the dump into multiple pieces. The equivalent capability on restore is *not* required, because once the dump has been split the restore never needs to be. It might seem that the patch should be symmetrical with respect to pg_dump and pg_restore, but I see no use case for the pg_restore case. I'm inclined to agree with this. It might have been nice to provide a way to split out already-created dumps, but I suspect that people who need that probably have already figured out a way to do it (I know I have..). We should probably ensure that pg_restore doesn't *break* when fed a partial dump. The patch as submitted enforces what seem largely arbitrary restrictions on combining these switches. I had it both ways at various points in development. I'm happy with what you propose. I agree with removing the restrictions. I don't see much in the way of use cases, but it reduces code and doesn't cause problems. Another issue is that the rules for deciding which objects are before data and which are after data are wrong. In particular ACLs are after data not before data, which is relatively easy to fix. OK This was partially why I was complaining about having documentation, and a policy for that matter, which goes into more detail about why X is before or after the data. I agree that they're after today, but I don't see any particular reason why they should be one or the other. If we adopted a policy like I proposed (--schema-post-data is essentially that which uses the data and is faster done in bulk) then ACLs would be before, and I tend to feel like it makes more sense that way. Not so easy to fix is that COMMENTs might be either before or after data depending on what kind of object they are attached to. Is there anything to fix? Comments are added by calls to dumpComment, which are always made in conjunction with the dump of an object. So if you dump the object you dump the comment. As long as objects are correctly split out then comments will be also. I agree with this, and it follows for BLOB comments- in any case, they go with the object being dumped at the time of that object getting dumped. Comments make sense as an extention of the object, not as a seperate set of objects to be explicitly placed before or after the data. All of the above makes me certain I want to remove these options from pg_restore. I'm in agreement with this. BTW, another incomplete item is that pg_dumpall should probably be taught to accept and pass down --schema-before-data and --schema-after-data switches. OK I could go either way on this. Can we prune down to the base use case to avoid this overhead? i.e. have these options on pg_dump only? Makes sense to me. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_dump additional options for performance
* Tom Lane ([EMAIL PROTECTED]) wrote: Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote: The key problem is that pg_restore is broken: The key capability here is being able to split the dump into multiple pieces. The equivalent capability on restore is *not* required, because once the dump has been split the restore never needs to be. This argument is nonsense. The typical usage of this capability, IMHO, will be pg_dump -Fc whole.dump pg_restore --schema-before-data whole.dump before.sql pg_restore --data-only whole.dump data.sql pg_restore --schema-after-data whole.dump after.sql followed by editing the schema pieces and then loading. I dislike, and doubt that I'd use, this approach. At the end of the day, it ends up processing the same (very large amount of data) multiple times. We have 60G dump files sometimes, and there's no way I'm going to dump that into a single file first if I can avoid it. What we end up doing today is --schema-only followed by vi'ing it and splitting it up by hand, etc, then doing a seperate --data-only dump. One reason is that this gives you a consistent dump, whereas three successive pg_dump runs could never guarantee any such thing. While this is technically true, in most cases people have control over the schema bits and would likely be able to ensure that the schema doesn't change during the time. At that point it's only the data, which is still done in a transactional way. Another reason is that you may well not know when you prepare the dump that you will need split output, because the requirement to edit the dump is likely to be realized only when you go to load it. This is a good point. My gut reaction is that, at least in my usage, it would be more about if it's larger than a gig, I might as well split it out, just in case I need to touch something. Honestly, it's rare that I don't have to make *some* change. Often that's the point of dumping it out. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_dump additional options for performance
Simon, * Simon Riggs ([EMAIL PROTECTED]) wrote: ...and with command line help also. The documentation and whatnot looks good to me now. There are a couple of other issues I found while looking through and testing the patch though- Index: src/bin/pg_dump/pg_dump.c === RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v retrieving revision 1.497 diff -c -r1.497 pg_dump.c *** src/bin/pg_dump/pg_dump.c 20 Jul 2008 18:43:30 - 1.497 --- src/bin/pg_dump/pg_dump.c 23 Jul 2008 17:04:24 - *** *** 225,232 RestoreOptions *ropt; static int disable_triggers = 0; ! static int outputNoTablespaces = 0; static int use_setsessauth = 0; static struct option long_options[] = { {data-only, no_argument, NULL, 'a'}, --- 229,238 RestoreOptions *ropt; static int disable_triggers = 0; ! static int outputNoTablespaces = 0; static int use_setsessauth = 0; + static int use_schemaBeforeData; + static int use_schemaAfterData; static struct option long_options[] = { {data-only, no_argument, NULL, 'a'}, *** This hunk appears to have a bit of gratuitous whitespace change, not a big deal tho. *** *** 464,474 [...] + if (dataOnly) + dumpObjFlags = REQ_DATA; + + if (use_schemaBeforeData == 1) + dumpObjFlags = REQ_SCHEMA_BEFORE_DATA; + + if (use_schemaAfterData == 1) + dumpObjFlags = REQ_SCHEMA_AFTER_DATA; + + if (schemaOnly) + dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA); *** It wouldn't kill to be consistant between testing for '== 1' and just checking for non-zero. Again, not really a big deal, and I wouldn't mention these if there weren't other issues. *** *** 646,652 * Dumping blobs is now default unless we saw an inclusion switch or -s * ... but even if we did see one of these, -b turns it back on. */ ! if (include_everything !schemaOnly) outputBlobs = true; /* --- 689,695 * Dumping blobs is now default unless we saw an inclusion switch or -s * ... but even if we did see one of these, -b turns it back on. */ ! if (include_everything WANT_PRE_SCHEMA(dumpObjFlags)) outputBlobs = true; /* *** Shouldn't this change be to WANT_DATA(dumpObjFlags)? That's what most of the '!schemaOnly' get translated to. Otherwise I think you would be getting blobs when you've asked for just schema-before-data, which doesn't seem like it'd make much sense. *** *** 712,718 dumpStdStrings(g_fout); /* The database item is always next, unless we don't want it at all */ ! if (include_everything !dataOnly) dumpDatabase(g_fout); /* Now the rearrangeable objects. */ --- 755,761 dumpStdStrings(g_fout); /* The database item is always next, unless we don't want it at all */ ! if (include_everything WANT_DATA(dumpObjFlags)) dumpDatabase(g_fout); /* Now the rearrangeable objects. */ *** Shouldn't this be 'WANT_PRE_SCHEMA(dumpObjFlags)'? *** *** 3414,3420 continue; /* Ignore indexes of tables not to be dumped */ ! if (!tbinfo-dobj.dump) continue; if (g_verbose) --- 3459,3465 continue; /* Ignore indexes of tables not to be dumped */ ! if (!tbinfo-dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags)) continue; if (g_verbose) *** I didn't test this, but it strikes me as an unnecessary addition? If anything, wouldn't this check make more sense being done right after dropping into getIndexes()? No sense going through the loop just for fun.. Technically, it's a behavioral change for --data-only since it used to gather index information anyway, but it's a good optimization if done in the right place. Also around here, there doesn't appear to be any checking in dumpEnumType(), which strikes me as odd. Wouldn't that deserve a if (!WANT_PRE_SCHEMA(dumpObjFlags)) return; check? If not even some kind of equivilant -dobj.dump check.. *** *** 9803,9809 tbinfo-dobj.catId, 0, tbinfo-dobj.dumpId); } ! if (!schemaOnly) { resetPQExpBuffer(query); appendPQExpBuffer(query, SELECT pg_catalog.setval(); --- 9848,9854 tbinfo-dobj.catId, 0, tbinfo-dobj.dumpId); } !
Re: [PATCHES] pg_dump additional options for performance
* Simon Riggs ([EMAIL PROTECTED]) wrote: The options split the dump into 3 parts that's all: before the load, the load and after the load. --schema-pre-load says Dumps exactly what option--schema-only/ would dump, but only those statements before the data load. What is it you are suggesting? I'm unclear. That part is fine, the problem is that elsewhere in the documentation (patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you change it to be objects required before data loading, which isn't the same. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_dump additional options for performance
Simon, * Simon Riggs ([EMAIL PROTECTED]) wrote: I hadn't realized that Simon was using pre-schema and post-schema to name the first and third parts. I'd agree that this is confusing nomenclature: it looks like it's trying to say that the data is the schema, and the schema is not! How about pre-data and post-data? OK by me. Any other takers? Having the command-line options be --schema-pre-data and --schema-post-data is fine with me. Leaving them the way they are is also fine by me. It's the documentation (back to pg_dump.sgml, ~774/~797) that starts talking about Pre-Schema and Post-Schema. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_dump additional options for performance
Tom, et al, * Tom Lane ([EMAIL PROTECTED]) wrote: Ah, I see. No objection to those switch names, at least assuming we want to stick to positive-logic switches. What did you think of the negative-logic suggestion (--omit-xxx)? My preference is for positive-logic switches in general. The place where I would use this patch would lend itself to being more options if --omit- were used. I expect that would hold true for most people. It would be: --omit-data --omit-post-load --omit-pre-load --omit-post-load --omit-pre-load --omit-data vs. --schema-pre-load --data-only --schema-post-load Point being that I'd be dumping these into seperate files where I could more easily manipulate the pre-load or post-load files. I'd still want pre/post load to be seperate though since this would be used in cases where there's alot of data (hence the reason for the split) and putting pre and post together and running them before data would slow things down quite a bit. Are there use cases for just --omit-post-load or --omit-pre-load? Probably, but I just don't see any situation where I'd use them like that. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_dump additional options for performance
Simon, * Simon Riggs ([EMAIL PROTECTED]) wrote: On Sun, 2008-07-20 at 05:47 +0100, Simon Riggs wrote: On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote: [...] - Conflicting option handling Thanks for putting in the extra code to explicitly indicate which conflicting options were used. - Documentation When writing the documentation I would stress that pre-schema and post-schema be defined in terms of PostgreSQL objects and why they are pre vs. post. Perhaps this is up for some debate, but I find the documentation added for these options to be lacking the definitions I was looking for, and the explanation of why they are what they are. I'm also not sure I agree with the Pre-Schema and Post-Schema nomenclature as it doesn't really fit with the option names or what they do. Would you consider: termoption--schema-pre-load/option/term listitem para Pre-Data Load - Minimum amount of the schema required before data loading can begin. This consists mainly of creating the tables using the commandCREATE TABLE/command. This part can be requested using option--schema-pre-load/. /para /listitem termoption--schema-post-load/option/term listitem para Post-Data Load - The rest of the schema definition, including keys, indexes, etc. By putting keys and indexes after the data has been loaded the whole process of restoring data is much faster. This is because it is faster to build indexes and check keys in bulk than piecemeal as the data is loaded. This part can be requested using option--schema-post-load/. /para /listitem Even this doesn't cover everything though- it's too focused on tables and data loading. Where do functions go? What about types? A couple of additional points: - The command-line help hasn't been updated. Clearly, that also needs to be done to consider the documentation aspect complete. - There appears to be a bit of mistakenly included additions. The patch to pg_restore.sgml attempts to add in documentation for --superuser. I'm guessing that was unintentional, and looks like just a mistaken extra copypaste. - Technically, the patch needs to be updated slightly since another pg_dump-related patch was committed recently which also added options and thus causes a conflict. I think this might have just happened again, funny enough. It's something that a committer could perhaps fix, but if you're reworking the patch anyway... Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_dump additional options for performance
* Simon Riggs ([EMAIL PROTECTED]) wrote: On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote: Even this doesn't cover everything though- it's too focused on tables and data loading. Where do functions go? What about types? Yes, it is focused on tables and data loading. What about functions/types? No relevance here. I don't see how they're not relevant, it's not like they're being excluded and in fact they show up in the pre-load output. Heck, even if they *were* excluded, that should be made clear in the documentation (either be an explicit include list, or saying they're excluded). Part of what's driving this is making sure we have a plan for future objects and where they'll go. Perhaps it would be enough to just say pre-load is everything in the schema, except things which are faster done in bulk (eg: indexes, keys). I don't think it's right to say pre-load is only object definitions required to load data when it includes functions and ACLs though. Hopefully my suggestion and these comments will get us to a happy middle-ground. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_dump additional options for performance
* daveg ([EMAIL PROTECTED]) wrote: One observation, indexes should be built right after the table data is loaded for each table, this way, the index build gets a hot cache for the table data instead of having to re-read it later as we do now. That's not how pg_dump has traditionally worked, and the point of this patch is to add options to easily segregate the main pieces of the existing pg_dump output (main schema definition, data dump, key/index building). You suggestion brings up an interesting point that should pg_dump's traditional output structure change the --schema-post-load set of objects wouldn't be as clear to newcomers since the load and the indexes would be interleaved in the regular output. I'd be curious about the performance impact this has on an actual load too. It would probably be more valuable on smaller loads where it would have less of an impact anyway than on loads larger than the cache size. Still, not an issue for this patch, imv. Thanks, Stephen signature.asc Description: Digital signature
[PATCHES] pg_dump additional options for performance
Simon, I agree with adding these options in general, since I find myself frustrated by having to vi huge dumps to change simple schema things. A couple of comments on the patch though: - Conflicting option handling I think we are doing our users a disservice by putting it on them to figure out exactly what: multiple object groups cannot be used together means to them. You and I may understand what an object group is, and why there can be only one, but it's a great deal less clear than the prior message of options -s/--schema-only and -a/--data-only cannot be used together My suggestion would be to either list out the specific options which can't be used together, as was done previously, or add a bit of (I realize, boring) code and actually tell the user which of the conflicting options were used. - Documentation When writing the documentation I would stress that pre-schema and post-schema be defined in terms of PostgreSQL objects and why they are pre vs. post. - Technically, the patch needs to be updated slightly since another pg_dump-related patch was committed recently which also added options and thus causes a conflict. Beyond those minor points, the patch looks good to me. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] Feature: give pg_dump a WHERE clause expression
* daveg ([EMAIL PROTECTED]) wrote: The feature that the proposed patch enables is to create pg_dump custom format archives for multiple tables with a predicate. No amount of csv or xml will do that. Contrived example: Uh, pg_dump's custom format really isn't particularly special, to be honest. Is there some reason you're interested in using it over, as was suggested, COPY/CSV/etc format? You could, of course, gzip the output of COPY (as pg_dump does) if you're concerned about space.. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] column level privileges
* Andrew Dunstan ([EMAIL PROTECTED]) wrote: Tom Lane wrote: I'm not sure where we go from here. Your GSOC student has disappeared, right? Is anyone else willing to take up the patch and work on it? No, he has not disappeared at all. He is going to work on fixing issues and getting the work up to SQL99 level. Great! Your review should help enormously. Stephen, perhaps you would like to work with him. I'd be happy to. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] column level privileges
Tom, et al, * Tom Lane ([EMAIL PROTECTED]) wrote: I'm not sure where we go from here. Your GSOC student has disappeared, right? Is anyone else willing to take up the patch and work on it? I'm willing to take it up and work on it. I'm very interested in having column-level privileges in PG. I also have some experiance in the gram.y and ACL areas already that should make things go quickly. If anyone else is interested/has resources, please let me know. Admittedly the patch's syntax is more logical (especially if you consider the possibility of multiple tables) but I don't think we can go against the spec. This problem invalidates the gram.y changes and a fair amount of what was done in aclchk.c. Agreed, we need to use the SQL spec syntax. 2. The checking of INSERT/UPDATE permissions has been moved to a completely unacceptable time/place, namely during parse analysis instead of at the beginning of execution. This is unusable for prepared queries, for example, and it also fails to apply permission checking properly for UPDATEs of inheritance trees (only the parent would get checked). This seems not very simple to fix :-(. By the time the plan gets to the executor it is not clear which columns were actually specified as targets by the user and which were filled in as defaults by the rewriter or planner. One possible solution is to add a flag field to TargetEntry to carry the information forward. I'll look into this, I liked the bitmap idea, personally. Some other points that need to be considered: 1. The execution of GRANT/REVOKE for column privileges. Now only INSERT/UPDATE/REFERENCES privileges are supported, as SQL92 specified. SELECT privilege is now not supported. Well, SQL99 has per-column SELECT privilege, so we're already behind the curve here. The main problem again is to figure out a reasonable means for the executor to know which columns to check. TargetEntry markings won't help here. I thought about adding a bitmap to each RTE showing the attribute numbers explicitly referenced in the query, but I'm unsure if that's a good solution or not. I'd be willing to leave this as work to be done later, since 90% of the patch is just concerned with the mechanics of storing per-column privileges and doesn't care which ones they are exactly. But it needs to be on the to-do list. I think it would be quite unfortunate to not include per-column SELECT privileges with the initial version. It has significant uses and would really be a pretty obvious hole in our implementation. What I think would be a more desirable solution is that the table ACL still sets the table-level INSERT or UPDATE privilege bit as long as you have any such privilege. In the normal case where no per-column privilege has been granted, the per-column attacl fields all remain NULL and that's all you need. As soon as any per-column GRANT or REVOKE is issued against a table, expand out the per-column ACLs to match the previous table-level state, and then apply the per-column changes. I think you'd need an additional pg_class flag column, say relhascolacls, to denote whether this has been done --- then privilege checking would know it only needs to look at the column ACLs when this field is set. I agree with this approach and feel it's alot cleaner as well as faster. We definitely don't want to make permission checking take any more time than it absolutely has to. With this scheme we don't need per-column entries in pg_shdepend, we can just reference the table-level bits as before. REVOKE would have the responsibility of getting rid of per-column entries, if any, as a followup to revoking per-table entries during a DROP USER operation. Doesn't sound too bad. Something else that needs to be thought about is whether system columns have privileges or not. The patch seems to be assuming not in some places, but at least for SELECT it seems like this might be sensible. Also, you can already do COPY TO the OID column if any, so even without any future extensions it seems like we've got the issue in front of us. I certainly feel we should be able to have per-column privileges on system columns, though we should only use them were appropriate, of course. One other mistake I noted was that the version checks added in pg_dump and psql are = 80300, which of course is obsolete now. That one's pretty easy to handle. :) Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] Proposed patch to disallow password=foo in database name parameter
* Tom Lane ([EMAIL PROTECTED]) wrote: Anybody think this is good, bad, or silly? Does the issue need explicit documentation, and if so where and how? I'm going to have to vote 'silly' on this one. While I agree that in general we should discourage, and not provide explicit command-line options for, passing a password on the command-line, I don't feel that it makes sense to explicitly complicate things to prevent it. Just my 2c, Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] allow CSV quote in NULL
* Gregory Stark ([EMAIL PROTECTED]) wrote: Tom Lane [EMAIL PROTECTED] writes: Stephen Frost [EMAIL PROTECTED] writes: Please find attached a minor patch to remove the constraints that a user can't include the delimiter or quote characters in a 'NULL AS' string when importing CSV files. This can't really be sane can it? Including unquoted delimiter characters seems kind of insane. But including the quote characters or quoted delimiter characters seems reasonable. Right. We could write something to check if the user provides an unquoted delimiter character but I'm not really sure it's worth it, to be perfectly honest. If it'll make people happier with the patch I can do it though. The alternative would be interpreting NULL strings after dequoting but that would leave no way to include the NULL string literally. This solution means there's no way to include it (if it needs quoting) but only when you specify it this way. Yeah, interpreting NULLs after dequoting means you've lost the information about if it's quoted or not, or you have to add some funky syntax to say if it's quoted, do it differently..., which is no good, imv. What the patch does basically is say give us the exact string that shows up between the unquoted delimiters that you want to be treated as a NULL. This removes the complexity of the question about quoting, unquoting, whatever, and makes it a very clear-cut, straight-forward solution with no impact on existing users, imv. Thanks, Stephen signature.asc Description: Digital signature
[PATCHES] allow CSV quote in NULL
Greetings, Please find attached a minor patch to remove the constraints that a user can't include the delimiter or quote characters in a 'NULL AS' string when importing CSV files. This allows a user to explicitly request that NULL conversion happen on fields which are quoted. As the quote character is being allowed to be in the 'NULL AS' string now, there's no reason to exclude the delimiter character from being seen in that string as well, though unless quoted using the CSV quote character it won't ever be matched. An example of the usage: sfrost*=# \copy billing_data from ~/BillingSamplePricerFile.csv with csv header quote as '' null as '' This is no contrived example, it's an issue I ran into earlier today when I got a file which had (for reasons unknown to me and not easily changed upstream): 1,V,WASHDCABC12,,120033... Both of the ending columns shown are integer fields, the here being used to indicate a NULL value. Without the patch, an ERROR occurs: sfrost= \copy billing_data from ~/BillingSamplePricerFile.csv with csv header quote as '' ERROR: invalid input syntax for integer: And there's no way to get it to import with COPY CSV mode. The patch adds this ability without affecting existing usage or changing the syntax. Even with the patch an ERROR occurs with the default treatment of CSV files: sfrost=# \copy billing_data from ~/BillingSamplePricerFile.csv with csv header quote as '' ERROR: invalid input syntax for integer: Which would be expected. If the file is modified to remove the s for NULL columns, it imports just fine with the syntax above. It'd be really nice to have this included. Thanks! Stephen Index: src/backend/commands/copy.c === RCS file: /projects/cvsroot/pgsql/src/backend/commands/copy.c,v retrieving revision 1.285 diff -c -r1.285 copy.c *** src/backend/commands/copy.c 20 Jun 2007 02:02:49 - 1.285 --- src/backend/commands/copy.c 27 Jul 2007 04:13:15 - *** *** 926,944 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg(COPY force not null only available using COPY FROM))); - /* Don't allow the delimiter to appear in the null string. */ - if (strchr(cstate-null_print, cstate-delim[0]) != NULL) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg(COPY delimiter must not appear in the NULL specification))); - - /* Don't allow the CSV quote char to appear in the null string. */ - if (cstate-csv_mode - strchr(cstate-null_print, cstate-quote[0]) != NULL) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg(CSV quote character must not appear in the NULL specification))); - /* Disallow file COPY except to superusers. */ if (!pipe !superuser()) ereport(ERROR, --- 926,931 *** *** 2888,2894 { bool found_delim = false; bool in_quote = false; - bool saw_quote = false; char *start_ptr; char *end_ptr; int input_len; --- 2875,2880 *** *** 2921,2927 /* start of quoted field (or part of field) */ if (c == quotec !in_quote) { - saw_quote = true; in_quote = true; continue; } --- 2907,2912 *** *** 2971,2977 /* Check whether raw input matched null marker */ input_len = end_ptr - start_ptr; ! if (!saw_quote input_len == cstate-null_print_len strncmp(start_ptr, cstate-null_print, input_len) == 0) fieldvals[fieldno] = NULL; --- 2956,2962 /* Check whether raw input matched null marker */ input_len = end_ptr - start_ptr; ! if (input_len == cstate-null_print_len strncmp(start_ptr, cstate-null_print, input_len) == 0) fieldvals[fieldno] = NULL; ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] SSPI authentication - patch
* Magnus Hagander ([EMAIL PROTECTED]) wrote: On Thu, Jul 19, 2007 at 06:22:57PM -0400, Stephen Frost wrote: My thinking would be to have the autoconf to disable it, but enable it by default. I don't feel particularly strongly about it though. Do you see a use-case where someone would disable it? I'll be happy to add the switch if you do, it's not hard to do, but adding a switch just for the sake of adding a switch is not something I lik e:-) Eh, I could contrive one but, as I said, I don't feel particularly strongly about it. How about we go w/o it for now and see if anyone asks for it. I understand that SSPI is case-insensitive, or folds to uppercase, or whatever, but this is *not* used only by the SSPI code. Please correct me if I'm wrong, but this will break existing krb-auth using client applications/setups that went with the previous default, no? I realize it's on Windows, but there are people out there with that configuration (yes, like me... :)... Ok, first to clearify the facts: * SSPI is case-insensitive, case-preserving * The problem is not from SSPI. It's Active Directory. If you use AD as the KDC, you must use uppercase SPNs - regardless of SSPI. For example, it's needed for anybody wanting to use the old krb5 auth in 8.x together with Active Directory - like I do :-) Ah, thanks for clearing up where the problem arises from. The change is there to because the majority of windows installs will be using Active Directory, at least that's what I would expect. Certainly not all, but most. It's a way of lowering the bar for the majority, at the expense of the minority ;-) It's also at the expense of backwards compatibility. :/ People who are currently using the krb5 auth mechanism with AD are used to having to flip that or set the environment variable while people who have been using it with an MIT KDC may get suprised by it. That said, I actually intended to submit that as a separate patch for separate discussion. If people are against it, I'll be happy to drop that part. My main concern is that it's a backward-incompatible change. I realize that it's likely going in the direction of the majority on Windows but it seems to make like it's not something we should just 'do'. That said, I don't see it as a problem for me since I've got a reasonably small user-base (10s, not 100s or 1000s) of Windows users and setting the environment variable shouldn't be an issue. Again, it's not related to the library used, it's related to the KDC. And we can't detect that, at least not early enough. That's true, but if we used upper-case with something NEW (SSPI) while keeping it the same for the OLD (KRB5, and I'd vote GSSAPI) then we're not breaking backwards compatibility while also catering to the masses. I guess I don't see too many people using SSPI w/ an MIT KDC, and it wasn't possible previously anyway. What do you think? Thanks! Stephen signature.asc Description: Digital signature
Re: [PATCHES] SSPI authentication - patch
* Magnus Hagander ([EMAIL PROTECTED]) wrote: Here's an updated version of this patch. This version has full SSPI support in the server as well, so I can do both kerberos and NTLM between two windows machines using the negotiate method. Great! Also, I've tested that it works under Windows using PGGSSLIB=gssapi with the MIT GSS libraries. I did have to set the PGKRBSRVNAME to 'postgres'. It worked excellently. :) Since SSPI and GSSAPI can now both be used, my plan is not to have an autoconf to disable SSPI, but to just enable it unconditionally on win32. Or does this seem like a bad idea? My thinking would be to have the autoconf to disable it, but enable it by default. I don't feel particularly strongly about it though. Comments welcome. It looks good in general to me (though I'm not super-familiar with SSPI). My one big concern is this: /* Define to the name of the default PostgreSQL service principal in Kerberos. (--with-krb-srvnam=NAME) */ ! #define PG_KRB_SRVNAM postgres /* A string containing the version number, platform, and C compiler */ #define PG_VERSION_STR Uninitialized version string (win32) --- 582,588 /* Define to the name of the default PostgreSQL service principal in Kerberos. (--with-krb-srvnam=NAME) */ ! #define PG_KRB_SRVNAM POSTGRES I understand that SSPI is case-insensitive, or folds to uppercase, or whatever, but this is *not* used only by the SSPI code. Please correct me if I'm wrong, but this will break existing krb-auth using client applications/setups that went with the previous default, no? I realize it's on Windows, but there are people out there with that configuration (yes, like me... :)... I don't particularly like it but, honestly, it seems like it might be better to set it based on what's being used (GSSAPI/SSPI/KRB5)? This would be for the client-side, as I guess we've decided it's okay to just pick whatever keytab the users provide that's in our server-side keytab. Thanks again! Stephen signature.asc Description: Digital signature
Re: [PATCHES] dblink connection security
* Joe Conway ([EMAIL PROTECTED]) wrote: Get serious. Internal functions are specifically designed and maintained to be safe within the confines of the database security model. We are discussing extensions to the core, all of which must be installed by choice, by a superuser. Extensions should also be designed and maintained to be safe within the confines of the database security model. Having to be installed by a superuser doesn't change that. I would consider it a serious risk which would need to be fixed if, for example, a function in PostGIS was found to allow priviledge escalation by a user. Claiming it was installed by choice, by a superuser doesn't change that. It's about as good as saying Well, an admin had to install PostgreSQL on the system, by choice, and therefore we don't need to worry about PG allowing someone remote shell access to the system. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] dblink connection security
* Tom Lane ([EMAIL PROTECTED]) wrote: Joe Conway [EMAIL PROTECTED] writes: But if you know of a security risk related to using libpq with a password authenticated connection, let's hear it. As near as I can tell, the argument is that dblink might be used to send connection-request packets to random addresses. Now this is only a Yes. security issue if the attacker could not have reached such an address directly; otherwise he might as well send the packet himself (and have a No. Being able to come from a different address is valuable even if you can get to that address directly yourself. lot more control over its content). So I guess the scenario is that you're running your database on your firewall machine, where it is accessible from outside your net but also can reach addresses inside. It wouldn't need to be on your firewall, just behind it, which is extremely common. And you're letting untrustworthy outside people log into the database. It's not nearly so convoluted. SQL injections happen. And you put dblink on it for them to use. And even then, the amount of damage they could do seems pretty limited due to lack of control over the packet contents. dblink could have been installed for a variety of reasons. Making it openly available on install makes it much less likely any additional restrictions were placed on it. To me this scenario is too far-fetched to justify sacrificing convenience and backwards compatibility. It should be sufficient to add some paragraphs about security considerations to the dblink docs. I feel that requiring a sysadmin to issue a 'grant' if they want that convenience is justified and reasonable. We could include the statement itself in the documentation we're expecting them to read anyway so they can just copy paste it. Adding paragraphs to the documentation is good but doesn't justify a insecure-by-default approach. Regardless of what core ends up doing, I'm hopeful it'll be disabled by default under Debian. It'd certainly be easier if it was done upstream. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] dblink connection security
* Gregory Stark ([EMAIL PROTECTED]) wrote: Joe Conway [EMAIL PROTECTED] writes: If there are no objections I'll commit this later today. My objection is that I think we should still revoke access for non-superuser by default. The patch makes granting execute reasonable for most users but nonetheless it shouldn't be the default. Being able to connect to a postgres server shouldn't mean being able to open tcp connections *from* that server to arbitrary other host/ports. Consider for example that it would allow a user to perform a port scan from inside your network to see what internal services are running. I'm in agreement with Greg. It's a poor idea, overall, to allow users to initiate TCP connections from the backend. That should be a superuser-only ability and should require security definer functions with appropriate safe-guards (which would be site-specific) to be created by the end admins. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] dblink connection security
* Gregory Stark ([EMAIL PROTECTED]) wrote: Actually from a security point of view revoking public execute is pretty much the same as making a function super-user-only. The only difference is how much of a hassle it is for the super-user to grant access. Perhaps we should reconsider whether any of the other super-user-only functions should be simply not executable by default but work normally if granted. Revoking public execute on it by default would definitely make me happier. I could be swayed either way on the explicit super-user check in the function itself. In the general case, imv we should at least attempt to consider the risk involved in improper handling of the permissions around super-user-only functions. Higher risk implies an extra check in the code to force use of SECURITY DEFINER functions to work around it, in an attempt to impart the severity of the risk. Thinking about it a bit more, I'd honestly like to see the check there for dblink(). That's not entirely fair of me though I suppose, I really don't feel comfortable with dblink() to begin with and don't expect I'll ever use it. :) Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] dblink connection security
* Joe Conway ([EMAIL PROTECTED]) wrote: If you are going to argue that we should revoke access for non-superusers by default for dblink, then you are also arguing that we should do the same for every function created with any untrusted language. Uh, no, one doesn't imply the other. It doesn't follow that because a specific, known insecure, function shouldn't be available to all users immediately that quite probably safe/secure functions (even though they're written in an untrusted language- what has that got to do with anything?) also shouldn't be. E.g. as I pointed out to Robert last week, just because an unsafe function is created in plperlu, it doesn't mean that a non-superuser can't run it immediately after it is created. There is no difference. It is incumbent upon the DBA/superuser to be careful _whenever_ they create any function using an untrusted language. This isn't a case of the DBA/superuser writing the function. It's being provided by a package. It's also *inherently* insecure and isn't just a matter of being careful. You can create functions in an untrusted language carefully enough to allow it to be called by other users. It is simply prudent for the package provider to disable insecure functions by default. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] dblink connection security
* Joe Conway ([EMAIL PROTECTED]) wrote: Consider a scenario like package x uses arbitrary function y in an untrusted language z. Exact same concerns arise. No, it doesn't... Said arbitrary function in y, in untrusted language z, could be perfectly safe for users to call. Being written in an untrusted language has got next to nothing to do with the security implications of a particular function. It depends entirely on what the function is *doing*, not what language it's written in. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] dblink connection security
* Joe Conway ([EMAIL PROTECTED]) wrote: Stephen Frost wrote: No, it doesn't... Said arbitrary function in y, in untrusted language z, could be perfectly safe for users to call. ^ *Could* be. But we just said that the admin was not interested in reading the documentation, and has no idea if it *is* safe. And, it very well might not be safe. We have no way to know in advance because the language is untrusted. If it's not safe then it shouldn't be enabled by default. That's pretty much the point. If something is known to be unsafe for users to have access to then it should be disabled by default. Being written in an untrusted language has got next to nothing to do with the security implications of a particular function. It depends entirely on what the function is *doing*, not what language it's written in. Sure it matters. A function written in a trusted language is known to be safe, a priori. A function written in an untrusted language has no such guarantees, and therefore has to be assumed unsafe unless carefully proved otherwise. I see.. So all the functions in untrusted languages that come with PG initially should be checked over by every sysadmin when installing PG every time... And the same for PostGIS, and all of the PL's that use untrusted languages? On my pretty modest install that's 2,206 functions. For some reason I see something of a difference between 'generate_series' and 'dblink' in terms of security and which one I'm comfortable having enabled by default and which one I'm not. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] dblink connection security
* Gregory Stark ([EMAIL PROTECTED]) wrote: Joe Conway [EMAIL PROTECTED] writes: Consider a scenario like package x uses arbitrary function y in an untrusted language z. Exact same concerns arise. Well arbitrary function may or may not actually do anything that needs to be restricted. If it does then yes the same concerns arise and the same conclusion reached. That users should be granted permission to execute it based on local policies. Certainly granting execute permission to public by default is a bad start in that regard. Agreed, and regardless of the sysadmin doing x, y, or z, or what some other package might be doing with untrusted languages, what matters here is what we're doing and the functions we're providing. Best practice is to disable functions by default which aren't safe secure for users to have access to. If you know of any others in anything we're distributing, please point them out. If there are some in related projects, point those out and ask those projects to be careful with them and encourage them to disable them by default. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] dblink connection security
* Joe Conway ([EMAIL PROTECTED]) wrote: Stephen Frost wrote: I see.. So all the functions in untrusted languages that come with PG initially should be checked over by every sysadmin when installing PG every time... And the same for PostGIS, and all of the PL's that use untrusted languages? There are none installed by default -- that's the point. Uhh... None what? Functions in untrusted languages? That's certainly not the case, there's a whole slew of them, from boolin to generate_series and beyond. They're available to regular users, even! Or do you mean that there are no known-insecure functions which are installed and enabled for users to use by default? I'd have to agree with you there in general, would kind of like to keep it that way too. Perhaps you're referring to PLs, but then, I thought trusted PLs were safe, but they're written using untrusted languages! Are they safe, or not? Safe to use, but not safe to install? On my pretty modest install that's 2,206 functions. For some reason I see something of a difference between 'generate_series' and 'dblink' in terms of security and which one I'm comfortable having enabled by default and which one I'm not. generate_series is a built in function. We aren't discussing those. Uh, it's written in an untrusted language, isn't it? Us poor sysadmins are supposed to review all of them before letting users have access to them, aren't we? Now I'm just completely confused as to the distinction you're making here. Are functions in untrusted languages are problem, or not? Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] Preliminary GSSAPI Patches
* Henry B. Hotz ([EMAIL PROTECTED]) wrote: On Jun 22, 2007, at 9:56 AM, Magnus Hagander wrote: Most likely it's just checking the keytab to find a principal with the same name as the one presented from the client. Since one is present, it loads it up automatically, and verifies against it. Bingo! The server uses the keytab to decrypt the token provided by the client. By using the GSS_C_NO_CREDENTIAL arg on the server anything put in the keytab is OK. (The server doesn't need to authenticate itself to Kerberos, it just accepts authentication. Mutual authentication is done using the same keys.) The documentation needs to reflect that. I agree there's some disconnect there between the documentation and the apparent implementation but I'm not sure I'm in favor of changing the documentation on this one. Personally, I'd rather it return an error if someone tries to use GSS_C_NO_CREDENTIAL when accepting a context than to just be happy using anything in the keytab. The real question is whether we *need* to check anything. If the keytab is unique to PostgreSQL, as I think it should be, then I think the admin should put only the right stuff in the keytab, and no further checks are needed. While I agree that in general it's best to have a keytab file for each seperate service, there's no guarentee of that being done and using, say, the 'host' service to allow authentication to PG when the PG service is 'postgres' isn't right. Now it *might* make sense to check that the credential that's accepted actually has some correspondence to what ./configure said. Or what's configured in the postgres.conf (as is done for Kerberos now...). If we do do that, then we need to allow for the ways Microsoft mucks with the case of the name. (Kerberos is supposed to be case sensitive, but Microsoft work that way.) In particular I think we may need both postgres/server and POSTGRES/server in the keytab in order to support the to-be-written native Windows SSPI client at the same time as the current Kerberos 5 and GSSAPI Unix clients. Supporting multiple, specific, keys might be an interesting challenge, but I'm not too keen on worrying about it right now regardless. I'd also much rather err on the side of overly paranoid than if it works, just let it in. If someone ends up having to support both windows SSPI clients and unix Kerberos/GSSAPI clients it's entirely possible to suggest they just make it POSTGRES and configure the clients accordingly. Now what happens with non-Kerberos 5 mechansims like SPKM and SPNEGO? ;-) I'll have to see, even if it doesn't matter for Java, yet. Err, SPNEGO's not really a full mechanism itself, it's got sub-mechs of NTLM and Kerberos and from what I've seen it does the same thing SSPI does and assumes uppercase. Again, it's ugly, but not unbearable. Such fun things are probably also why mod-auth-kerb for Apache lets you configure the service name, like most other Kerberos servers... I've yet to run into one that will just use any key in the keytab it's given. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCHES] array_accum aggregate
* Tom Lane ([EMAIL PROTECTED]) wrote: That's not really the flavor of solution I'd like to have. Ideally, it'd actually *work* to write my_ffunc(my_sfunc(my_sfunc(null, 1), 2)) and get the same result as aggregating over the values 1 and 2. The trick is to make sure that my_sfunc and my_ffunc can only be used together. Maybe we do need a type for each such aggregate ... In general I like this idea but there are some complications, the main one being where would the memory be allocated? I guess we could just always use the QueryContext. The other issue is, in the above scenario is it acceptable to modify the result of my_sfunc(null, 1) in the ,2 call? Normally it's unacceptable to modify your input, aggregates get a special exception when called as an aggregate, but in this case you'd have to copy the entire custom structure underneath, I'd think. As for a type for each such aggregate, that seems reasonable to me, honestly. I'd like for the type creation to be reasonably simple though, if possible. ie: could we just provide NULLs for the input/output functions instead of having to implement functions that just return an error? Or perhaps have a polymorphic function which can be reused that just returns an error. Additionally, we'd have to be able to mark the types as being polymorhpic along the same lines as anyelement/anyarray. Hopefully that's already easy, but if not it'd be nice if it could be made easy... Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCHES] array_accum aggregate
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: The other issue is, in the above scenario is it acceptable to modify the result of my_sfunc(null, 1) in the ,2 call? Yes, because the only place a nonnull value of the type could have come from is a my_sfunc call; since it's a pseudotype, we don't allow it on disk. (We might also need a hack to prevent the type from being used as a record-type component ... not sure if that comes for free with being a pseudotype currently.) Ah, ok. As for a type for each such aggregate, that seems reasonable to me, honestly. The ugly part is that we'd still need a way for the planner to recognize this class of types. Hrm, true. I would agree that they would need more memory than most aggregates. It seems likely to me that worst offenders are going to be of the array_accum type- store all tuples coming in. Therefore, my suggestion would be that the memory usage be estimated along those lines and allow for hashagg when there's enough memory to keep all the tuples (or rather the portion of the tuple going into the aggregate) in memory (plus some amount of overhead, maybe 10%). That doesn't help with how to get the planner to recognize this class of types though. I don't suppose we already have a class framework which the planner uses to group types which have similar characteristics? Additionally, we'd have to be able to mark the types as being polymorhpic along the same lines as anyelement/anyarray. What for? So that the finalfunc can be polymorphic along the lines of my suggested aaccum_sfunc(anyarray,anyelement) returns anyarray and aaccum_ffunc(anyarray) returns anyarray. If the state type isn't polymorphic then PG won't let me create a function from it which returns a polymorphic, aiui. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] array_accum aggregate
* Neil Conway ([EMAIL PROTECTED]) wrote: On Wed, 2006-10-11 at 00:51 -0400, Stephen Frost wrote: Here, the actual state type for any aggregate call is the array type !having the actual input type as elements. Note: array_accum() is now !a built-in aggregate which uses a much more efficient mechanism than !that which is provided by array_append, prior users of array_accum() !may be pleasantly suprised at the marked improvment for larger arrays. /para Not worth documenting, I think. Ok. Either way is fine with me, honestly. *** 15,20 --- 15,22 #include utils/array.h #include utils/builtins.h #include utils/lsyscache.h + #include utils/memutils.h + #include nodes/execnodes.h Include directives should be sorted roughly alphabetically, with the exception of postgres.h and system headers. Ah, yeah, you know I noticed that when I added memutils but wasn't thinking when I went back and added execnodes (I had been trying to minimize the number of includes by adding ones I needed one at a time and seeing if any more were necessary). + /* Structure, used by aaccum_sfunc and aaccum_ffunc to + * implement the array_accum() aggregate, for storing + * pointers to the ArrayBuildState for the array we are + * building and the MemoryContext in which it is being + * built. Note that this structure is + * considered an 'anyarray' externally, which is a + * variable-length datatype, and therefore + * must open with an int32 defining the length. */ Gross. Indeed. :/ + /* Make sure we are being called in an aggregate. */ + if (!fcinfo-context || !IsA(fcinfo-context, AggState)) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), +errmsg(Can not call aaccum_sfunc as a non-aggregate), +errhint(Use the array_accum aggregate))); Per the error message style guidelines, errmsg() should begin with a lowercase letter and errhint() should be a complete sentence (so it needs punctuation, for example). Ah, ok, guess I should go read those. + Datum + aaccum_ffunc(PG_FUNCTION_ARGS) + { + aaccum_info *ainfo; + + /* Check if we are passed in a NULL */ + if (PG_ARGISNULL(0)) PG_RETURN_ARRAYTYPE_P(NULL); There is no guarantee why SQL NULL and PG_RETURN_XYZ(NULL) refer to the same thing -- use PG_RETURN_NULL() to return a SQL NULL value, or just make the function strict. Huh, alright. I'll probably just change it to PG_RETURN_NULL(). Thanks! Stephen signature.asc Description: Digital signature
Re: [PATCHES] array_accum aggregate
* Tom Lane ([EMAIL PROTECTED]) wrote: (However, now that we support nulls in arrays, meseems a more consistent definition would be that it allows null inputs and just includes them in the output. So probably you do need it non-strict.) This was my intention. I'm inclined to think that this example demonstrates a deficiency in the aggregate-function design: there should be a way to declare what we're really doing. But I don't know exactly what that should look like. I agree and would much rather have a clean solution which works with the design than one which has to work outside it. When I first was trying to decide on the state-type I was looking through the PG catalogs for essentially a complex C type which translated to a void*. Perhaps such a type could be added. Unless that's considered along the lines of an 'any' type it'd cause problems for the polymorphism aspect. Another alternative would be to provide a seperate area for each aggregate to put any other information it needs. This would almost certainly only be available to C functions but would essentially be a void* which is provided through the AggState structure but tracked by the aggregator routines and reset for each aggregate function being run. If that's acceptable, I don't think it'd be all that difficult to implement. With that, aaccum_sfunc and aaccum_ffunc would ignore the state variable passed to them in favor of their custom structure available through fcinfo-AggState (I expect they'd just keep the state variable NULL and be marked non-strict, or set it to some constant if necessary). The pointer would have to be tracked somewhere and then copied in/out on each call, but that doesn't seem too difficult to me. After all, the state variable is already being tracked somewhere, this would just sit next to it, in my head anyway. I've got some time this weekend and would be happy to take a shot at the second proposal if that's generally acceptable. Thanks, Stephen signature.asc Description: Digital signature
[PATCHES] array_accum aggregate
Greetings, Please find below a patch to add the array_accum aggregate as a built-in using two new C functions defined in array_userfuncs.c. These functions simply expose the pre-existing efficient array building routines used elsewhere in the backend (accumArrayResult and makeArrayResult, specifically). An array_accum aggregate has existed in the documentation for quite some time using the inefficient (for larger arrays) array_append routine. The documentation around the example has also been updated to reflect the addition of this built-in. Documentation and a regression test are also included. Thanks, Stephen Index: doc/src/sgml/func.sgml === RCS file: /projects/cvsroot/pgsql/doc/src/sgml/func.sgml,v retrieving revision 1.343 diff -c -r1.343 func.sgml *** doc/src/sgml/func.sgml 1 Oct 2006 18:54:31 - 1.343 --- doc/src/sgml/func.sgml 11 Oct 2006 04:38:45 - *** *** 7851,7856 --- 7851,7872 row entry indexterm + primaryarray_accum/primary +/indexterm +functionarray_accum(replaceable class=parameteranyelement/replaceable)/function + /entry + entry +typeanyelement/type + /entry + entry + array of elements of same type as argument type + /entry + entryan array of all input elements (NULLs, non-nulls, and duplicates)/entry + /row + + row + entry +indexterm primaryaverage/primary /indexterm functionavg(replaceable class=parameterexpression/replaceable)/function Index: doc/src/sgml/xaggr.sgml === RCS file: /projects/cvsroot/pgsql/doc/src/sgml/xaggr.sgml,v retrieving revision 1.33 diff -c -r1.33 xaggr.sgml *** doc/src/sgml/xaggr.sgml 16 Sep 2006 00:30:16 - 1.33 --- doc/src/sgml/xaggr.sgml 11 Oct 2006 04:38:45 - *** *** 132,138 /programlisting Here, the actual state type for any aggregate call is the array type !having the actual input type as elements. /para para --- 132,141 /programlisting Here, the actual state type for any aggregate call is the array type !having the actual input type as elements. Note: array_accum() is now !a built-in aggregate which uses a much more efficient mechanism than !that which is provided by array_append, prior users of array_accum() !may be pleasantly suprised at the marked improvment for larger arrays. /para para Index: src/backend/utils/adt/array_userfuncs.c === RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/array_userfuncs.c,v retrieving revision 1.20 diff -c -r1.20 array_userfuncs.c *** src/backend/utils/adt/array_userfuncs.c 14 Jul 2006 14:52:23 - 1.20 --- src/backend/utils/adt/array_userfuncs.c 11 Oct 2006 04:38:46 - *** *** 15,20 --- 15,22 #include utils/array.h #include utils/builtins.h #include utils/lsyscache.h + #include utils/memutils.h + #include nodes/execnodes.h /*- *** *** 399,404 --- 401,516 PG_RETURN_ARRAYTYPE_P(result); } + /* Structure, used by aaccum_sfunc and aaccum_ffunc to + * implement the array_accum() aggregate, for storing + * pointers to the ArrayBuildState for the array we are + * building and the MemoryContext in which it is being + * built. Note that this structure is + * considered an 'anyarray' externally, which is a + * variable-length datatype, and therefore + * must open with an int32 defining the length. */ + typedef struct { + int32vl_len; + ArrayBuildState *astate; + MemoryContextarrctx; + } aaccum_info; + + /*- + * aaccum_sfunc : + * State transistion function for the array_accum() aggregate, + * efficiently builds an in-memory array by working in blocks and + * minimizing realloc()'s and copying of the data in general. + * Creates a seperate memory context attached to the AggContext into + * which the array is built. That context is free'd when the final + * function is called (aaccum_ffunc). accumArrayResult() does all + * the heavy lifting here, this is really just a glue function. + * + */ + Datum + aaccum_sfunc(PG_FUNCTION_ARGS) + { + aaccum_info *ainfo; + AggState*aggstate; + + /* Make sure we are being called in an aggregate. */ + if (!fcinfo-context || !IsA(fcinfo-context, AggState)) +
[PATCHES] BUG #2246: Only call pg_fe_getauthname if none given
* Tom Lane ([EMAIL PROTECTED]) wrote: Right offhand I like the idea of pushing it into connectOptions2 --- can you experiment with that? Seems like there is no reason to call Kerberos if the user supplies the name to connect as. Patch attached. After looking through the code around this I discovered that conninfo_parse is also called by PQconndefaults. This patch changes the 'default' returned for user to NULL (instead of whatever pg_fe_getauthname returns). This is probably better than not calling Kerberos in pg_fe_getauthname and potentially returning the wrong thing (if the username doesn't match the princ, a common situation), but it might break existing applications. Are those applications wrong to be asking libpq to provide what it thinks the username is when asking for the defaults? I'd say probably yes, but then we don't provide another way for them to get it either. Patch tested w/ 'trust', 'ident', 'md5' and 'krb5' methods from psql. Enjoy, Stephen Index: src/interfaces/libpq/fe-connect.c === RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/fe-connect.c,v retrieving revision 1.326 diff -c -r1.326 fe-connect.c *** src/interfaces/libpq/fe-connect.c 13 Feb 2006 22:33:57 - 1.326 --- src/interfaces/libpq/fe-connect.c 15 Feb 2006 18:21:06 - *** *** 96,105 * fallback is available. If after all no value can be determined * for an option, an error is returned. * - * The value for the username is treated specially in conninfo_parse. - * If the Compiled-in resource is specified as a NULL value, the - * user is determined by pg_fe_getauthname(). - * * The Label and Disp-Char entries are provided for applications that * want to use PQconndefaults() to create a generic database connection * dialog. Disp-Char is defined as follows: --- 96,101 *** *** 423,434 --- 419,448 * * Compute derived connection options after absorbing all user-supplied info. * + * The value for the username is treated specially in connectOptions2. + * If the Compiled-in resource is specified as a NULL value and the user + * doesn't provide a username to use then the user is determined by + * calling pg_fe_getauthname(). + * * Returns true if OK, false if trouble (in which case errorMessage is set * and so is conn-status). */ static bool connectOptions2(PGconn *conn) { + charerrortmp[PQERRORMSG_LENGTH]; + + /* +* Find username to use if none given. This may call +* additional helpers (ie: krb5) if those auth methods +* are compiled in. +*/ + if (conn-pguser == NULL || conn-pguser[0] == '\0') + { + conn-pguser = pg_fe_getauthname(errortmp); + /* note any error message is thrown away */ + } + /* * If database name was not given, default it to equal user name */ *** *** 2505,2511 char *cp2; PQconninfoOption *options; PQconninfoOption *option; - charerrortmp[PQERRORMSG_LENGTH]; /* Make a working copy of PQconninfoOptions */ options = malloc(sizeof(PQconninfoOptions)); --- 2519,2524 *** *** 2722,2737 } continue; } - - /* -* Special handling for user -*/ - if (strcmp(option-keyword, user) == 0) - { - option-val = pg_fe_getauthname(errortmp); - /* note any error message is thrown away */ - continue; - } } return options; --- 2735,2740 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] BUG #2246: Only call pg_fe_getauthname if none given
* Tom Lane ([EMAIL PROTECTED]) wrote: This is probably not a good idea --- changing the API behavior in pursuit of saving a few cycles is just going to get people mad at us. Fair enough. I think we'd have to refactor the code so that PQsetdbLogin gets a PQconninfoOption array, overrides values *in that array*, then calls the fallback-substitution code etc. Not sure if it's worth the trouble. The extra complexity of searching the array for values to override could eat up the cycles we're hoping to save, too :-( Perhaps I'm missing something obvious (and if so, I'm sorry) but couldn't we just build up the character array in PQsetdbLogin to be passed in to connectOptions1? If we do that we could probably also merge the two connectOptions... That would simplify things a great deal I think and would also avoid the extra processing to pick up the 'defaults'... Stephen signature.asc Description: Digital signature
Re: [PATCHES] [HACKERS] Krb5 multiple DB connections
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: I have little idea of how expensive the operations called by pg_krb5_init really are. If they are expensive then it'd probably make sense to keep the current static variables but treat 'em as a one-element cache, ie, recompute if a new user name is being demanded. If not, we ought to be able to simplify some things. We'd have to recompute based on the KRB5CCNAME environment variable changing, which is certainly an option. It's not necessairly the case that the username is changing, possibly just the cache. Hm, apparently I completely misunderstand the problem here. What I thought the bug was was that the cache wasn't recomputed given an attempt to connect as a different Postgres username than the first time. If that's not the issue, then what is? The specific problem which I and the original reporter ran into is this: KRB5CCACHE=/tmp/krb5cc_apache_aev0kF pg_connect() -- works fine pg_close() -- works fine rm /tmp/krb5cc_apache_aev0kF KRB5CCACHE=/tmp/krb5cc_apache_cVMRtA pg_connect() -- Doesn't work, Kerberos error is no credentials cache What's happening here is that for every connection to apache by the client a new credentials cache is created, and then destroyed when the connection closes. When using PHP (ie: phppgadmin) and mod_php (as is common) the Apache process is the one actually making the connection to the database and a given Apache process usually serves multiple requests in its lifetime, sometimes to the same user, sometimes to different users. The static variables being used are for: krb5_init_context, krb5_cc_default, krb5_cc_get_principal, and krb5_unparse_name. Technically, between one connection and the next, krb5_cc_default, krb5_cc_get_principal and krb5_unparse_name could reasonably return different values. krb5_init_context is pretty unlikely to change as that would mean /etc/krb5.conf changed. Looking through the krb5 source code it appears that the only one which checks for something existing is krb5_cc_default_name (called by krb5_cc_default), which will just return the current ccache name if one has been set. (src/lib/krb5/os/ccdefname.c:278, krb5-1.4.3) We initially brought up this issue with the Kerberos folks actually: http://pch.mit.edu/pipermail/kerberos/2006-February/009225.html They pretty clearly felt that the application was responsible for handling the cache in the event it changes. Unfortunately, it's not the application which is talking to Kerberos but another library in this case which doesn't expose the Kerberos API to the application in such a way to allow the application to notify Kerberos of the cache change. Looking through the Kerberos API again it looks like it might be possible to use krb5_cc_set_default_name(context, NULL) to force krb5_cc_default_name() (from krb5_cc_default()) to re-find the cache. Finding the cache again is reasonably inexpensive. I've tried a couple of things along these lines now but I havn't found a workable solution which doesn't reinitialize the main Kerberos context, unfortunately. I'll keep working on it as time allows though honestly I don't believe it's generally terribly expensive to reinitialize the context... Sorry it took so long to reply, running down the paths through the various libraries takes a bit of time and I was really hoping to be able to suggest an alternative solution using krb5_cc_set_default_name. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] [HACKERS] Krb5 multiple DB connections
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: The attached patch fixes a bug which was originally brought up in May of 2002 in this thread: Now that I've looked at it, I find this patch seems fairly wrongheaded. AFAICS the entire point of the original coding is to allow the setup work needed to create the krb5_context etc to be amortized across multiple connections. The patch destroys that advantage, but yet keeps as much as it can of the notational cruft induced by the original design -- for instance, there seems little point in the pg_krb5_initialised flag if we aren't ever going to have any pre-initialized state. I'm honestly not entirely sure I agree about that being the point of the original coding but regardless the crux of the problem here is that there's no way to get libpq to use a cache other than the one it's initialized with for a given session. That part of the Kerberos API which supports that isn't exposed in any way beyond the KRB5CCNAME environment variable and the call to ask for the 'default' ccache is called with the static variable the second time and it ignores the request when there's an already valid (it thinks) ccache. I have little idea of how expensive the operations called by pg_krb5_init really are. If they are expensive then it'd probably make sense to keep the current static variables but treat 'em as a one-element cache, ie, recompute if a new user name is being demanded. If not, we ought to be able to simplify some things. We'd have to recompute based on the KRB5CCNAME environment variable changing, which is certainly an option. It's not necessairly the case that the username is changing, possibly just the cache. Additionally, the calls themselves are not very expensive when being called on an existing cache, the most expensive thing is reaching out to the KDC to get a new service ticket which will either need to be done, or won't, depending on if a valid service ticket already exists in the cache or not. Another point here is how all this interacts with thread safety. If we get rid of the static variables, do we still need the pglock_thread() operations? Good question, I'm afraid probably not. I'd have to look through it again but last I checked MIT Kerberos prior to 1.4 (and I'm not 100% sure it's resolved in 1.4) wasn't threadsafe itself. I'd certainly be happy to rework the patch based on these comments, of course. Honestly, I'm pretty sure the original patch was intended to be minimal (and is for the most part). These changes would introduce more logic but if that's alright I'd be happy to do it. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] [HACKERS] Krb5 multiple DB connections
Greetings, The attached patch fixes a bug which was originally brought up in May of 2002 in this thread: http://archives.postgresql.org/pgsql-interfaces/2002-05/msg00083.php The original bug reporter also supplied a patch to fix the problem: http://archives.postgresql.org/pgsql-interfaces/2002-05/msg00090.php I ran into exactly the same issue using 8.1.2. I've updated the original patch to work for HEAD and 8.1.2. I've tested the patch on both HEAD and 8.1.2, both at home and at work, and it works quite nicely. In fact, I hope to have a patch to phppgadmin which will make it properly handle Kerberized logins. Thanks, Stephen Index: src/interfaces/libpq/fe-auth.c === RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/fe-auth.c,v retrieving revision 1.110 diff -c -r1.110 fe-auth.c *** src/interfaces/libpq/fe-auth.c 26 Dec 2005 14:58:05 - 1.110 --- src/interfaces/libpq/fe-auth.c 5 Feb 2006 20:03:21 - *** *** 101,122 * Various krb5 state which is not connection specific, and a flag to * indicate whether we have initialised it yet. */ static intpg_krb5_initialised; static krb5_context pg_krb5_context; static krb5_ccache pg_krb5_ccache; static krb5_principal pg_krb5_client; static char *pg_krb5_name; static int ! pg_krb5_init(char *PQerrormsg) { krb5_error_code retval; ! if (pg_krb5_initialised) return STATUS_OK; ! retval = krb5_init_context(pg_krb5_context); if (retval) { snprintf(PQerrormsg, PQERRORMSG_LENGTH, --- 101,133 * Various krb5 state which is not connection specific, and a flag to * indicate whether we have initialised it yet. */ + /* static intpg_krb5_initialised; static krb5_context pg_krb5_context; static krb5_ccache pg_krb5_ccache; static krb5_principal pg_krb5_client; static char *pg_krb5_name; + */ + + struct krb5_info + { + int pg_krb5_initialised; + krb5_contextpg_krb5_context; + krb5_ccache pg_krb5_ccache; + krb5_principal pg_krb5_client; + char*pg_krb5_name; + }; static int ! pg_krb5_init(char *PQerrormsg, struct krb5_info *info) { krb5_error_code retval; ! if (info-pg_krb5_initialised) return STATUS_OK; ! retval = krb5_init_context((info-pg_krb5_context)); if (retval) { snprintf(PQerrormsg, PQERRORMSG_LENGTH, *** *** 125,170 return STATUS_ERROR; } ! retval = krb5_cc_default(pg_krb5_context, pg_krb5_ccache); if (retval) { snprintf(PQerrormsg, PQERRORMSG_LENGTH, pg_krb5_init: krb5_cc_default: %s\n, error_message(retval)); ! krb5_free_context(pg_krb5_context); return STATUS_ERROR; } ! retval = krb5_cc_get_principal(pg_krb5_context, pg_krb5_ccache, ! pg_krb5_client); if (retval) { snprintf(PQerrormsg, PQERRORMSG_LENGTH, pg_krb5_init: krb5_cc_get_principal: %s\n, error_message(retval)); ! krb5_cc_close(pg_krb5_context, pg_krb5_ccache); ! krb5_free_context(pg_krb5_context); return STATUS_ERROR; } ! retval = krb5_unparse_name(pg_krb5_context, pg_krb5_client, pg_krb5_name); if (retval) { snprintf(PQerrormsg, PQERRORMSG_LENGTH, pg_krb5_init: krb5_unparse_name: %s\n, error_message(retval)); ! krb5_free_principal(pg_krb5_context, pg_krb5_client); ! krb5_cc_close(pg_krb5_context, pg_krb5_ccache); ! krb5_free_context(pg_krb5_context); return STATUS_ERROR; } ! pg_krb5_name = pg_an_to_ln(pg_krb5_name); ! pg_krb5_initialised = 1; return STATUS_OK; } /* * pg_krb5_authname -- returns a pointer to static space containing whatever --- 136,191 return STATUS_ERROR; } ! retval = krb5_cc_default(info-pg_krb5_context, (info-pg_krb5_ccache)); if (retval) { snprintf(PQerrormsg, PQERRORMSG_LENGTH, pg_krb5_init: krb5_cc_default: %s\n, error_message(retval)); ! krb5_free_context(info-pg_krb5_context); return STATUS_ERROR; } ! retval = krb5_cc_get_principal(info-pg_krb5_context, info-pg_krb5_ccache, ! (info-pg_krb5_client));
Re: [PATCHES] pg_restore COPY error handling
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: I believe the attached patch does this now. Under my test case it correctly handled things. I'm certainly happier with it this way and apologize for not realizing this better approach sooner. Please comment. Applied (with trivial stylistic changes) as far back as 8.0, which was the first release that would try to continue after an error. Great, thanks! Now if I can convince you to look at the Kerberos patch I posted on -hackers... ;) Thanks again, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_restore COPY error handling
* Tom Lane ([EMAIL PROTECTED]) wrote: I agree. I wonder if it wouldn't be cleaner to pass the information in the other direction, ie, send a boolean down to PrintTocData saying you are sending SQL commands or you are sending COPY data. Then, instead of depending only on the libpq state to decide what to do in ExecuteSqlCommandBuf, we could cross-check: if we're sending SQL data and the libpq state is wrong, just discard the line. I believe the attached patch does this now. Under my test case it correctly handled things. I'm certainly happier with it this way and apologize for not realizing this better approach sooner. Please comment. Thanks! Stephen Index: src/bin/pg_dump/pg_backup_archiver.c === RCS file: /projects/cvsroot/pgsql/src/bin/pg_dump/pg_backup_archiver.c,v retrieving revision 1.119 diff -c -r1.119 pg_backup_archiver.c *** src/bin/pg_dump/pg_backup_archiver.c21 Jan 2006 02:16:20 - 1.119 --- src/bin/pg_dump/pg_backup_archiver.c3 Feb 2006 02:47:33 - *** *** 330,339 --- 330,345 * with libpq. */ if (te-copyStmt strlen(te-copyStmt) 0) + { ahprintf(AH, %s, te-copyStmt); + AH-writingCopy = 1; + } (*AH-PrintTocDataPtr) (AH, te, ropt); + if (te-copyStmt strlen(te-copyStmt) 0) + AH-writingCopy = 0; + _enableTriggersIfNecessary(AH, te, ropt); } } *** *** 1590,1595 --- 1596,1602 AH-compression = compression; AH-pgCopyBuf = createPQExpBuffer(); + AH-writingCopy = 0; AH-sqlBuf = createPQExpBuffer(); /* Open stdout with no compression for AH output handle */ Index: src/bin/pg_dump/pg_backup_archiver.h === RCS file: /projects/cvsroot/pgsql/src/bin/pg_dump/pg_backup_archiver.h,v retrieving revision 1.68 diff -c -r1.68 pg_backup_archiver.h *** src/bin/pg_dump/pg_backup_archiver.h15 Oct 2005 02:49:38 - 1.68 --- src/bin/pg_dump/pg_backup_archiver.h3 Feb 2006 02:47:33 - *** *** 245,250 --- 245,251 int loFd; /* BLOB fd */ int writingBlob;/* Flag */ + int writingCopy;/* Flag to indicate if we are in COPY mode */ int blobCount; /* # of blobs restored */ char *fSpec; /* Archive File Spec */ Index: src/bin/pg_dump/pg_backup_db.c === RCS file: /projects/cvsroot/pgsql/src/bin/pg_dump/pg_backup_db.c,v retrieving revision 1.66 diff -c -r1.66 pg_backup_db.c *** src/bin/pg_dump/pg_backup_db.c 15 Oct 2005 02:49:38 - 1.66 --- src/bin/pg_dump/pg_backup_db.c 3 Feb 2006 02:47:33 - *** *** 389,395 *- */ ! if (PQputline(AH-connection, AH-pgCopyBuf-data) != 0) die_horribly(AH, modulename, error returned by PQputline\n); resetPQExpBuffer(AH-pgCopyBuf); --- 389,395 *- */ ! if (AH-pgCopyIn PQputline(AH-connection, AH-pgCopyBuf-data) != 0) die_horribly(AH, modulename, error returned by PQputline\n); resetPQExpBuffer(AH-pgCopyBuf); *** *** 400,406 if (isEnd) { ! if (PQendcopy(AH-connection) != 0) die_horribly(AH, modulename, error returned by PQendcopy\n); AH-pgCopyIn = 0; --- 400,406 if (isEnd) { ! if (AH-pgCopyIn PQendcopy(AH-connection) != 0) die_horribly(AH, modulename, error returned by PQendcopy\n); AH-pgCopyIn = 0; *** *** 615,621 /* Could switch between command and COPY IN mode at each line */ while (qry eos) { ! if (AH-pgCopyIn) qry = _sendCopyLine(AH, qry, eos); else qry = _sendSQLLine(AH, qry, eos); --- 615,624 /* Could switch between command and COPY IN mode at each line */ while (qry eos) { ! /* If we are in CopyIn mode *or*
Re: [PATCHES] pg_restore COPY error handling
* Bruce Momjian (pgman@candle.pha.pa.us) wrote: Stephen Frost wrote: -- Start of PGP signed section. * Bruce Momjian (pgman@candle.pha.pa.us) wrote: Stephen Frost wrote: Great! It'd be really nice to have this fix in 8.1.3... :) No, it will not be in 8.1.3. It is a new feature. I'd really appriciate it if you could review my post to pgsql-bugs, Message-ID: [EMAIL PROTECTED]. If this patch is considered a new feature then I'd like to either revisit that decision or be given some direction on what a bug-fix patch for the grows-to-3G bug would look like. Oh, so when it skips the \., it reads the rest of the file and bombs out? I thought it just continued fine. How is that failure happening? It doesn't actually bomb out either. On the system I was working with what happened was that it hung after reading in the COPY data. It looked like it got stuck somehow while trying to parse the data as an SQL command. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_restore COPY error handling
* Andrew Dunstan ([EMAIL PROTECTED]) wrote: this is a hopeless way of giving a reference. Many users don't keep list emails. If you want to refer to a previous post you should give a reference to the web archives. Sorry, I'm actually pretty used to using Message IDs for references (we do it alot on the Debian lists). I'll try to be better in the future though, honestly, I'm not really a big fan of the Postgres mailing list archive setup. :/ I assume you are referring to this post: http://archives.postgresql.org/pgsql-bugs/2006-01/msg00188.php Yup, that's the one, thanks. Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_restore COPY error handling
* Tom Lane ([EMAIL PROTECTED]) wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Andrew Dunstan wrote: I assume you are referring to this post: http://archives.postgresql.org/pgsql-bugs/2006-01/msg00188.php OK, that helps. The solution is to not do that, meaning install postgis before the restore or something. Anyway, adding the patch seems like too large a risk for a minor release, especially since you are the first person to complain about it that I remember. I don't think you should see this as something specific to PostGIS. If I interpret Stephen's report properly, any sort of failure during COPY IN would lead to undesirable behavior in pg_restore. I'm reasonably confident this is exactly the case. This is not super surprising because the original design approach for pg_restore was bomb out on any sort of difficulty whatsoever. That was justly complained about, and now it will try to keep going after SQL-command errors ... but it sounds like the COPY-data processing part didn't get fixed properly. Exactly so. Possibly because it appears to be non-trivial to detect when it was a *COPY* command which failed (to differentiate it from some other command). What I did was check for 'whitespaceCOPY'. I'd be happy to change that if anyone has a better suggestion though. There might be a way to pass that information down into the pg_archive_db but I don't think it's available at that level currently and I didn't see any way to get the answer from libpq about what *kind* of command failed. I take no position on whether Stephen's patch is any good --- I haven't looked at it --- but if he's correct about the problem then I agree it's a bug fix. Before deciding whether it deserves to be back-patched, we at least ought to look closely enough to characterize the situations in which pg_restore will fail. I have some level of confidence that this is the only case along these lines because it's the only case where pg_restore isn't dealing with SQL commands but with a dataset which has to be handled in a special way. pg_restore checks if the command sent was a successful COPY command and then treats everything after it in a special way; it doesn't do that for any other commands... Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_restore COPY error handling
* Andrew Dunstan ([EMAIL PROTECTED]) wrote: Tom Lane said: Also, it might be possible to depend on whether libpq has entered the copy in state. I'm not sure this works very well, because if there's an error in the COPY command itself (not in the following data) then we probably don't ever get into copy in state. Could we not refrain from sending data if we detect that we are not in copy in state? You have to know at that level if it's data you're getting or an SQL statement though. SQL statements can fail and things move along happily. Essentially what my patch does is detect when a COPY command fails and then circular-file the data that comes in after it from the upper-level. When I started to look at the way pg_restore worked I had initially thought I could make the higher-level code realize that the COPY command failed through a return-code and have it not pass the data down but the return codes didn't appear to be very well defined to handle that kind of communication... (As in, it looked like trying to make that happen would break other things), I can look at it again though. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_restore COPY error handling
* Tom Lane ([EMAIL PROTECTED]) wrote: ISTM you should be depending on the archive structure: at some level, at least, pg_restore knows darn well whether it is dealing with table data or SQL commands. Alright, to do this, ExecuteSqlCommandBuf would need to be modified to return an error-code other than 1 for when a command fails. Thankfully, only pg_backup_archiver.c uses ExecuteSqlCommandBuf, in ahwrite. ahwrite would then need to return a value when there's an error (at the moment it's defined to return int but everything appears to ignore its return value). We'd then need ahprintf to return a negative value when it fails (at the moment it returns 'cnt' which appears to be the size of what was written, except that nothing looks at the return value of ahprintf either). Once we have ahprintf returning a negative value when it fails, we can modify pg_backup_archiver.c around line 332 to check if the ahprintf failed for a COPY statement and if so to skip the: (*AH-PrintTocDataPtr) (AH, te, ropt); // on line 335. I'd be happy to work this up, and I think it would work, but it seems kind of ugly since then we'd have ahwrite and ahprintf returning error codes which in 99% of the cases wouldn't be checked which doesn't seem terribly clean. :/ Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_restore COPY error handling
* Stephen Frost ([EMAIL PROTECTED]) wrote: * Tom Lane ([EMAIL PROTECTED]) wrote: ISTM you should be depending on the archive structure: at some level, at least, pg_restore knows darn well whether it is dealing with table data or SQL commands. [...] I'd be happy to work this up, and I think it would work, but it seems kind of ugly since then we'd have ahwrite and ahprintf returning error codes which in 99% of the cases wouldn't be checked which doesn't seem terribly clean. :/ Looking at this again, I'm not 100% sure it'd work quite that cleanly. I'm not sure you can just skip that PrintTocDataPtr call, depending on the how the input is coming in... There might be a way to modify _PrintData (in pg_backup_custom.c, and the others, which is what is the function behind PrintTocDataPtr) to somehow check an AH variable which essentially says the data command failed, just ignore the input, and we'd need to set the AH variable somewhere else, perhaps using the return value setup I described before... All of this is sounding rather invasive though. I have to admit that I'm not exactly a big fan of the pg_dump/pg_restore modular system. :/ Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_restore COPY error handling
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: * Tom Lane ([EMAIL PROTECTED]) wrote: This is not super surprising because the original design approach for pg_restore was bomb out on any sort of difficulty whatsoever. That was justly complained about, and now it will try to keep going after SQL-command errors ... but it sounds like the COPY-data processing part didn't get fixed properly. Exactly so. Possibly because it appears to be non-trivial to detect when it was a *COPY* command which failed (to differentiate it from some other command). What I did was check for 'whitespaceCOPY'. I'd be happy to change that if anyone has a better suggestion though. ISTM you should be depending on the archive structure: at some level, at least, pg_restore knows darn well whether it is dealing with table data or SQL commands. Right, it does higher up in the code but I don't believe that knowledge is passed down to the level it needs to be because it wasn't necessary before and the module API didn't include a way to pass that information into a given module. I expect this is why the code currently depends on libpq to tell it when it has entered a 'copy in' state. I'll look into this though and see just how invasive it would be to get the information to that level. Also, it might be possible to depend on whether libpq has entered the copy in state. I'm not sure this works very well, because if there's an error in the COPY command itself (not in the following data) then we probably don't ever get into copy in state. This is what the code currently depends on, and libpq (properly, imv) doesn't enter the 'copy in' state when the COPY command itself fails. That's exactly how we end up in this situation where pg_restore thinks it's looking at a SQL command (because libpq said it wasn't in a COPY state) when it's really getting COPY data from the higher level in the code. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] pg_restore COPY error handling
Bruce, et al, * Bruce Momjian (pgman@candle.pha.pa.us) wrote: This needs a little style cleanup but I can do that. Thanks! Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. Great! It'd be really nice to have this fix in 8.1.3... :) Thanks again, Stephen --- Stephen Frost wrote: * Stephen Frost ([EMAIL PROTECTED]) wrote: Needs to be changed to handle whitespace in front of the actual 'COPY', unless someone else has a better idea. This should be reasonably trivial to do though... If you'd like me to make that change and send in a new patch, just let me know. Fixed, using isspace(). Also added an output message to make it a bit clearer when a failed COPY has been detected, etc. Updated patch attached. Thanks, Stephen [ Attachment, skipping... ] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 signature.asc Description: Digital signature
[PATCHES] pg_restore COPY error handling
* Stephen Frost ([EMAIL PROTECTED]) wrote: Needs to be changed to handle whitespace in front of the actual 'COPY', unless someone else has a better idea. This should be reasonably trivial to do though... If you'd like me to make that change and send in a new patch, just let me know. Fixed, using isspace(). Also added an output message to make it a bit clearer when a failed COPY has been detected, etc. Updated patch attached. Thanks, Stephen Index: src/bin/pg_dump/pg_backup_db.c === RCS file: /projects/cvsroot/pgsql/src/bin/pg_dump/pg_backup_db.c,v retrieving revision 1.66 diff -c -r1.66 pg_backup_db.c *** src/bin/pg_dump/pg_backup_db.c 15 Oct 2005 02:49:38 - 1.66 --- src/bin/pg_dump/pg_backup_db.c 21 Jan 2006 19:54:56 - *** *** 292,297 --- 292,298 PGconn *conn = AH-connection; PGresult *res; charerrStmt[DB_MAX_ERR_STMT]; + int wsoffset = 0; /* fprintf(stderr, Executing: '%s'\n\n, qry-data); */ res = PQexec(conn, qry-data); *** *** 306,311 --- 307,317 } else { + /* Catch that this is a failed copy command, and +* set pgCopyIn accordingly */ + while (isspace(qry-data[wsoffset])) wsoffset++; + if (strncasecmp(qry-data+wsoffset,COPY ,5) == 0) AH-pgCopyIn = -1; + strncpy(errStmt, qry-data, DB_MAX_ERR_STMT); if (errStmt[DB_MAX_ERR_STMT - 1] != '\0') { *** *** 317,322 --- 323,330 warn_or_die_horribly(AH, modulename, %s: %sCommand was: %s\n, desc, PQerrorMessage(AH-connection), errStmt); + + if (AH-pgCopyIn == -1) write_msg(NULL, COPY failed, skipping COPY data.\n); } } *** *** 389,395 *- */ ! if (PQputline(AH-connection, AH-pgCopyBuf-data) != 0) die_horribly(AH, modulename, error returned by PQputline\n); resetPQExpBuffer(AH-pgCopyBuf); --- 397,405 *- */ ! /* If this is a failed copy command (pgCopyIn == -1) then just !* fall through */ ! if (AH-pgCopyIn == 1 PQputline(AH-connection, AH-pgCopyBuf-data) != 0) die_horribly(AH, modulename, error returned by PQputline\n); resetPQExpBuffer(AH-pgCopyBuf); *** *** 400,406 if (isEnd) { ! if (PQendcopy(AH-connection) != 0) die_horribly(AH, modulename, error returned by PQendcopy\n); AH-pgCopyIn = 0; --- 410,418 if (isEnd) { ! /* If this is a failed copy command (pgCopyIn == -1) then just !* fall through */ ! if (AH-pgCopyIn == 1 PQendcopy(AH-connection) != 0) die_horribly(AH, modulename, error returned by PQendcopy\n); AH-pgCopyIn = 0; *** *** 615,621 /* Could switch between command and COPY IN mode at each line */ while (qry eos) { ! if (AH-pgCopyIn) qry = _sendCopyLine(AH, qry, eos); else qry = _sendSQLLine(AH, qry, eos); --- 627,637 /* Could switch between command and COPY IN mode at each line */ while (qry eos) { ! /* If this is a working COPY *or* a failed COPY, call !* _sendCopyLine to handle the incoming data from the COPY !* command, it will just circular-file the data if we're !* running a failed COPY. */ ! if (AH-pgCopyIn == 1 || AH-pgCopyIn == -1) qry = _sendCopyLine(AH, qry, eos); else qry = _sendSQLLine(AH, qry, eos); ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PATCHES] [PATCH] pg_restore COPY error handling
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: It seems like pg_restore really should be able to handle COPY errors correctly by skipping to the end of the COPY data segment when the initial COPY command comes back as an error. Send a patch ;-) This is what I get for knowing how to copy paste C code, eh? ;-) Attached is a patch to pg_restore, against HEAD but I think it'd work against 8.1 just fine, to better handle it when a COPY command fails (for whatever reason) during a DB restore. Attempting to restore from a dump with 2 table objects under 8.1: [EMAIL PROTECTED]:/data/sfrost/postgres pg_restore -d tsf -Fc tiger_test.dump pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 412; 16672 1135494071 SCHEMA tiger_test postgres pg_restore: [archiver (db)] could not execute query: ERROR: permission denied for database tsf Command was: CREATE SCHEMA tiger_test; pg_restore: [archiver (db)] could not execute query: ERROR: schema tiger_test does not exist Command was: ALTER SCHEMA tiger_test OWNER TO postgres; pg_restore: [archiver] Error from TOC entry 21177; 1259 1135494072 TABLE bg01_d00 postgres pg_restore: [archiver] could not set search_path to tiger_test: ERROR: schema tiger_test does not exist pg_restore: [archiver (db)] could not execute query: ERROR: type public.geometry does not exist Command was: CREATE TABLE bg01_d00 ( ogc_fid integer, wkb_geometry public.geometry, area numeric(20,5), perimeter numeric... pg_restore: [archiver (db)] could not execute query: ERROR: schema tiger_test does not exist Command was: ALTER TABLE tiger_test.bg01_d00 OWNER TO postgres; pg_restore: [archiver (db)] Error from TOC entry 21178; 1259 1135497928 TABLE bg02_d00 postgres pg_restore: [archiver (db)] could not execute query: ERROR: type public.geometry does not exist Command was: CREATE TABLE bg02_d00 ( ogc_fid integer, wkb_geometry public.geometry, area numeric(20,5), perimeter numeric... pg_restore: [archiver (db)] could not execute query: ERROR: schema tiger_test does not exist Command was: ALTER TABLE tiger_test.bg02_d00 OWNER TO postgres; pg_restore: [archiver (db)] Error from TOC entry 21609; 0 1135494072 TABLE DATA bg01_d00 postgres pg_restore: [archiver (db)] could not execute query: ERROR: relation bg01_d00 does not exist Command was: COPY bg01_d00 (ogc_fid, wkb_geometry, area, perimeter, bg01_d00_, bg01_d00_i, state, county, tract, blkgroup, name, lsad, ls... pg_restore: [archiver (db)] Error from TOC entry 21610; 0 1135497928 TABLE DATA bg02_d00 postgres pg_restore: [archiver (db)] could not execute query: ERROR: syntax error at or near 1 at character 1 Command was: 1 01030001007D00F1283A3752FB55C0E38C825CB98041403BC3D4963AFB55C0D8D30E7F4D80414015376E313FFB55C07AE40F069E7F4140... WARNING: errors ignored on restore: 9 As you can see, it's treating the data (the 0103 bit) as a command, which is most certainly not right, especially when it *knows* that the COPY command failed. Attempting to restore from a dump with 2 table objects with patch: [EMAIL PROTECTED]:/data/sfrost/postgres/testinstall bin/pg_restore -d tsf -Fc -h localhost ../tiger_test.dump pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 412; 16672 1135494071 SCHEMA tiger_test postgres pg_restore: [archiver (db)] could not execute query: ERROR: role postgres does not exist Command was: ALTER SCHEMA tiger_test OWNER TO postgres; pg_restore: [archiver (db)] Error from TOC entry 21177; 1259 1135494072 TABLE bg01_d00 postgres pg_restore: [archiver (db)] could not execute query: ERROR: type public.geometry does not exist Command was: CREATE TABLE bg01_d00 ( ogc_fid integer, wkb_geometry public.geometry, area numeric(20,5), perimeter numeric... pg_restore: [archiver (db)] could not execute query: ERROR: relation tiger_test.bg01_d00 does not exist Command was: ALTER TABLE tiger_test.bg01_d00 OWNER TO postgres; pg_restore: [archiver (db)] Error from TOC entry 21178; 1259 1135497928 TABLE bg02_d00 postgres pg_restore: [archiver (db)] could not execute query: ERROR: type public.geometry does not exist Command was: CREATE TABLE bg02_d00 ( ogc_fid integer, wkb_geometry public.geometry, area numeric(20,5), perimeter numeric... pg_restore: [archiver (db)] could not execute query: ERROR: relation tiger_test.bg02_d00 does not exist Command was: ALTER TABLE tiger_test.bg02_d00 OWNER TO postgres; pg_restore: [archiver (db)] Error from TOC entry 21609; 0 1135494072 TABLE DATA bg01_d00 postgres pg_restore: [archiver (db)] could not execute query: ERROR: relation bg01_d00 does not exist Command was: COPY bg01_d00 (ogc_fid, wkb_geometry, area, perimeter
Re: [PATCHES] pg_restore COPY error handling
* Stephen Frost ([EMAIL PROTECTED]) wrote: * Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: It seems like pg_restore really should be able to handle COPY errors correctly by skipping to the end of the COPY data segment when the initial COPY command comes back as an error. Send a patch ;-) This is what I get for knowing how to copy paste C code, eh? ;-) Attached is a patch to pg_restore, against HEAD but I think it'd work against 8.1 just fine, to better handle it when a COPY command fails (for whatever reason) during a DB restore. [...] Command was: COPY bg02_d00 (ogc_fid, wkb_geometry, area, perimeter, bg02_d00_, bg02_d00_i, state, county, tract, blkgroup, name, lsad, ... WARNING: errors ignored on restore: 7 Of course, looking at this again, I'm afraid my COPY-attempt-detection logic isn't quite enough. I was hoping it'd be taken care of by _sendSQLLine, but apparently not. The line: if (strncasecmp(qry-data,COPY ,5) == 0) AH-pgCopyIn = -1; Needs to be changed to handle whitespace in front of the actual 'COPY', unless someone else has a better idea. This should be reasonably trivial to do though... If you'd like me to make that change and send in a new patch, just let me know. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] TRUNCATE, VACUUM, ANALYZE privileges
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: The following patch implements individual privileges for TRUNCATE, VACUUM and ANALYZE. Includes documentation and regression test updates. Resolves TODO item 'Add a separate TRUNCATE permission'. At least the 'no one interested has written a patch' argument is gone now, fire away with other comments/concerns. :) I have a very serious problem with the idea of inventing individual privilege bits for every maintenance command in sight. That does not scale. How will you handle GRANT ADD COLUMN, or GRANT ADD COLUMN as-long-as-its-not-SERIAL-because-I-dont-want-you-creating-sequences, or GRANT ALTER TABLE RELIABILITY as soon as someone writes that patch, or a dozen other cases that I could name without stopping for breath? GRANT ADD COLUMN, etc, aren't maintenance commands, they're DDL statements and as such should be the purview of the owner. TRUNCATE, VACUUM and ANALYZE are DML commands and are commands a user of the table would use through the normal course of inserting, updating or deleteing data in the table. The proposed patch eats three of the five available privilege bits (that is, available without accepting the distributed cost of enlarging ACL bitmasks), and you've made no case at all why we should spend that limited resource in this particular fashion. I've shown a specific use-case for this. It's been asked for before by others. I've shown why these particular ones make sense (while 'ADD COLUMN', etc, don't). If we come up with more Postgres-specific DML statements which aren't covered by other grants (which doesn't seem terribly likely at this point) then we should add those. I could see making VACUUM and ANALYZE use the same bit (since one implies the other) but I'm not really a big fan of that and I don't see any other need for these bits coming down the line anytime soon. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] TRUNCATE, VACUUM, ANALYZE privileges
* daveg ([EMAIL PROTECTED]) wrote: We rely heavily on truncate as delete for large numbers of rows is very costly. An example, we copy_in batches of rows from several sources through the day to a pending work table, with another process periodically processing the rows and sweeping them into a history table. The sweep leaves an empty pending work table. Truncate is very efficient for this pattern. However it means that all our jobs have to run with more permissions than they really should have as there is no way to grant truncate. If giving truncate its very own permission is too wasteful of permission bits, perhaps having truncate be the same as delete for permissions purposes would work. Sounds very similar to my use-case, except my users just have to suffer with delete because I don't want to grant them additional permissions. Having truncate act off of delete isn't actually an option unfortunately. This is because truncate skips triggers (probably not an issue for you, certainly not one for me, but a problem with doing it in the general case). I'm not sure about you, but I know that I'd like to be able to do: TRUNCATE, insert/copy data, ANALYZE without having to give all the other permissions associated with ownership. Alternatively a separate whole table operations permision might cover truncate and some of the alter type things too. Of course table owner does this, but that is what I don't want everyone to be require to have. I'm not entirely sure if that'd be better or not.. It would involve changing the structure of the ACLs to have two sets for each relation and you'd have to sometimes look at one, sometimes at the other, and possible both in some cases... Thanks, Stephen signature.asc Description: Digital signature
[PATCHES] TRUNCATE, VACUUM, ANALYZE privileges
Greetings, The following patch implements individual privileges for TRUNCATE, VACUUM and ANALYZE. Includes documentation and regression test updates. Resolves TODO item 'Add a separate TRUNCATE permission'. Created off of current (2005/01/03) CVS TIP. At least the 'no one interested has written a patch' argument is gone now, fire away with other comments/concerns. :) Thanks, Stephen Index: doc/src/sgml/func.sgml === RCS file: /projects/cvsroot/pgsql/doc/src/sgml/func.sgml,v retrieving revision 1.301 diff -c -r1.301 func.sgml *** doc/src/sgml/func.sgml 28 Dec 2005 01:29:58 - 1.301 --- doc/src/sgml/func.sgml 4 Jan 2006 03:38:01 - *** *** 8961,8968 The desired access privilege type is specified by a text string, which must evaluate to one of the values literalSELECT/literal, literalINSERT/literal, literalUPDATE/literal, ! literalDELETE/literal, literalRULE/literal, literalREFERENCES/literal, or ! literalTRIGGER/literal. (Case of the string is not significant, however.) An example is: programlisting SELECT has_table_privilege('myschema.mytable', 'select'); --- 8961,8969 The desired access privilege type is specified by a text string, which must evaluate to one of the values literalSELECT/literal, literalINSERT/literal, literalUPDATE/literal, ! literalDELETE/literal, literalRULE/literal, literalREFERENCES/literal, ! literalTRIGGER/literal, literalTRUNCATE/literal, literalVACUUM/literal, or ! literalANALYZE/literal. (Case of the string is not significant, however.) An example is: programlisting SELECT has_table_privilege('myschema.mytable', 'select'); Index: doc/src/sgml/information_schema.sgml === RCS file: /projects/cvsroot/pgsql/doc/src/sgml/information_schema.sgml,v retrieving revision 1.23 diff -c -r1.23 information_schema.sgml *** doc/src/sgml/information_schema.sgml8 Dec 2005 20:48:10 - 1.23 --- doc/src/sgml/information_schema.sgml4 Jan 2006 03:38:01 - *** *** 2395,2401 Type of the privilege: literalSELECT/literal, literalDELETE/literal, literalINSERT/literal, literalUPDATE/literal, literalREFERENCES/literal, !literalRULE/literal, or literalTRIGGER/literal /entry /row --- 2395,2403 Type of the privilege: literalSELECT/literal, literalDELETE/literal, literalINSERT/literal, literalUPDATE/literal, literalREFERENCES/literal, !literalRULE/literal, literalTRIGGER/literal, !literalTRUNCATE/literal, literalVACUUM/literal, or !literalANALYZE/literal. /entry /row *** *** 3643,3649 Type of the privilege: literalSELECT/literal, literalDELETE/literal, literalINSERT/literal, literalUPDATE/literal, literalREFERENCES/literal, !literalRULE/literal, or literalTRIGGER/literal /entry /row --- 3645,3653 Type of the privilege: literalSELECT/literal, literalDELETE/literal, literalINSERT/literal, literalUPDATE/literal, literalREFERENCES/literal, !literalRULE/literal, literalTRIGGER/literal, !literalTRUNCATE/literal, literalVACUUM/literal, or !literalANALYZE/literal. /entry /row Index: doc/src/sgml/user-manag.sgml === RCS file: /projects/cvsroot/pgsql/doc/src/sgml/user-manag.sgml,v retrieving revision 1.33 diff -c -r1.33 user-manag.sgml *** doc/src/sgml/user-manag.sgml20 Oct 2005 19:18:00 - 1.33 --- doc/src/sgml/user-manag.sgml4 Jan 2006 03:38:01 - *** *** 296,301 --- 296,302 There are several different kinds of privilege: literalSELECT/, literalINSERT/, literalUPDATE/, literalDELETE/, literalRULE/, literalREFERENCES/, literalTRIGGER/, +literalTRUNCATE/, literalVACUUM/, literalANALYZE/, literalCREATE/, literalTEMPORARY/, literalEXECUTE/, and literalUSAGE/. For more information on the different types of privileges supported by Index: doc/src/sgml/ref/grant.sgml === RCS file: /projects/cvsroot/pgsql/doc/src/sgml/ref/grant.sgml,v retrieving revision 1.50 diff -c -r1.50 grant.sgml *** doc/src/sgml/ref/grant.sgml 20 Oct 2005 19:18:01 - 1.50 --- doc/src/sgml/ref/grant.sgml 4 Jan 2006 03:38:02 - *** *** 20,26 refsynopsisdiv synopsis ! GRANT { { SELECT | INSERT | UPDATE | DELETE | RULE | REFERENCES | TRIGGER } [,...] | ALL [ PRIVILEGES ] } ON [ TABLE ] replaceable class=PARAMETERtablename/replaceable [, ...] TO { replaceable class=PARAMETERusername/replaceable |
Re: [PATCHES] Roles - SET ROLE Updated
* Tom Lane ([EMAIL PROTECTED]) wrote: BTW, I realized we do not support granting roles to PUBLIC: regression=# create role r; CREATE ROLE regression=# grant r to public; ERROR: role public does not exist but as far as I can tell SQL99 expects this to work. Indeed, I believe you're correct, sorry about missing that. Stephen signature.asc Description: Digital signature
Re: [PATCHES] Roles - SET ROLE Updated
* Tom Lane ([EMAIL PROTECTED]) wrote: Another issue: I like the has_role() function and in fact think it needs to come in multiple variants just like has_table_privilege and friends: has_role(name, name) has_role(name, oid) has_role(oid, name) has_role(oid, oid) has_role(name) -- implicitly has_role(current_user, ...) has_role(oid) However I'm a bit dubious about whether has_role isn't an invasion of application namespace. pg_has_role would be better, but we have the (mis) precedent of has_table_privilege. What do you think about calling it has_role_privilege? I thought about that originally. It seemed a bit long to me and I felt that having the 'privilege' of a role wasn't quite the same as having a 'role', but honestly I'm not terribly picky and on reflection a role *is* like other objects in the catalog (I originally hadn't considered it such), so, that's fine with me... has_role() was another reason I was thinking about having a seperate function for 'is_member_of_role' which didn't pollute the cache, just a side-note. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] Change Ownership Permission Checks
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: Attached please find a patch to change how the permissions checking for alter-owner is done. With roles there can be more than one 'owner' of an object and therefore it becomes sensible to allow specific cases of ownership change for non-superusers. Applied with minor revisions. The patch as submitted suffered a certain amount of copy-and-paste-itis (eg, trying to use pg_type_ownercheck on an opclass), and I really disliked using ACLCHECK_NOT_OWNER as the way to report you can't assign ownership to that role because you are not a member of it. So I made a separate error message for that case. Great, thanks! Sorry about the copy-and-paste-itis... Must have been a case I wasn't sure about. The different error message makes perfect sense. I see you also did the superuser-in-every-role change that I had included, thanks. When writing this patch it occurred to me that we nuke our member-of-role cache for one-off lookups on occation. I don't particularly like that, especially when we *know* it's a one-off lookup, so I was considering adding a function for the one-off lookup case but I couldn't come up with a way to avoid a fair bit of mostly-the-same code as the current cache-regen code, without making the cache-regen code alot slower which would negate the point. Just some thoughts. Thanks again, Stephen signature.asc Description: Digital signature
Re: [PATCHES] Disable page writes when fsync off, add GUC
* Bruce Momjian (pgman@candle.pha.pa.us) wrote: This patch disables page writes to WAL when fsync is off, because with no fsync guarantee, the page write recovery isn't useful. This doesn't seem quite right to me. What happens with PITR? And Postgres crashes? While many people seriously distrust running w/ fsync off, I'm sure there's quite a few folks which do. This also adds a full_page_writes GUC to turn off page writes to WAL. Some people might not want full_page_writes, but still might want fsync. Adding an option to not do page writes to WAL seems fine to me, but I think WAL writes should be on by default, even in the fsync=off case. If people want to turn it off, fine, for either case since we expect they understand what it means to have it turned off, but I don't think the two options should be coupled as is being proposed. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] per user/database connections limit again
* Petr Jelinek ([EMAIL PROTECTED]) wrote: + if (!(superuser() + || ((Form_pg_database) GETSTRUCT(tuple))-datdba == GetUserId())) + aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_DATABASE, +stmt-dbname); This should almost certainly be a pg_database_ownercheck() call instead. The rest needs to be updated for roles, but looks like it should be pretty easy to do. Much of it just needs to be repatched, the parts that do need to be changed look to be pretty simple changes. I believe the use of SessionUserId is probably correct in this patch. This does mean that this patch will only be for canlogin roles, but that seems like it's probably correct. Handling roles w/ members would require much more thought. Thanks, Stephen signature.asc Description: Digital signature
[PATCHES] Roles - SET ROLE
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: Tom, if you're watching, are you working on this? I can probably spend some time today on it, if that'd be helpful. I am not; I was hoping you'd deal with SET ROLE. Is it really much different from SET SESSION AUTHORIZATION? Here's what I've got done so far on SET ROLE. I wasn't able to get as much done as I'd hoped to. This is mostly just to get something posted before the end of today in case some might think it's more of a feature than a bug needing to be fixed (which is what I'd consider it, personally). I'll try and work on it some this weekend, but in the US it's a holiday weekend and I'm pretty busy. :/ Thanks, Stephen ? src/backend/parser/.gram.y.swp Index: src/backend/commands/variable.c === RCS file: /projects/cvsroot/pgsql/src/backend/commands/variable.c,v retrieving revision 1.109 diff -c -r1.109 variable.c *** src/backend/commands/variable.c 28 Jun 2005 05:08:55 - 1.109 --- src/backend/commands/variable.c 1 Jul 2005 21:34:16 - *** *** 564,569 --- 564,680 /* + * SET ROLE + * + * When resetting session auth after an error, we can't expect to do catalog + * lookups. Hence, the stored form of the value must provide a numeric oid + * that can be re-used directly. We store the string in the form of + * NAMEDATALEN 'x's, followed by T or F to indicate superuserness, followed + * by the numeric oid, followed by a comma, followed by the role name. + * This cannot be confused with a plain role name because of the NAMEDATALEN + * limit on names, so we can tell whether we're being passed an initial + * role name or a saved/restored value. + */ + extern char *role_string; /* in guc.c */ + + const char * + assign_role(const char *value, bool doit, GucSource source) + { + Oid roleid = InvalidOid; + boolis_superuser = false; + const char *actual_rolename = NULL; + char *result; + + if (strspn(value, x) == NAMEDATALEN + (value[NAMEDATALEN] == 'T' || value[NAMEDATALEN] == 'F')) + { + /* might be a saved userid string */ + Oid savedoid; + char *endptr; + + savedoid = (Oid) strtoul(value + NAMEDATALEN + 1, endptr, 10); + + if (endptr != value + NAMEDATALEN + 1 *endptr == ',') + { + /* syntactically valid, so break out the data */ + roleid = savedoid; + is_superuser = (value[NAMEDATALEN] == 'T'); + actual_rolename = endptr + 1; + } + } + + if (roleid == InvalidOid) + { + /* not a saved ID, so look it up */ + HeapTuple roleTup; + + if (!IsTransactionState()) + { + /* +* Can't do catalog lookups, so fail. The upshot of this is +* that session_authorization cannot be set in +* postgresql.conf, which seems like a good thing anyway. +*/ + return NULL; + } + + roleTup = SearchSysCache(AUTHNAME, + PointerGetDatum(value), +0, 0, 0); + if (!HeapTupleIsValid(roleTup)) + { + if (source = PGC_S_INTERACTIVE) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), +errmsg(role \%s\ does not exist, value))); + return NULL; + } + + roleid = HeapTupleGetOid(roleTup); + is_superuser = ((Form_pg_authid) GETSTRUCT(roleTup))-rolsuper; + actual_rolename = value; + + ReleaseSysCache(roleTup); + } + + if (doit) + SetRole(roleid); + + result = (char *) malloc(NAMEDATALEN + 32 + strlen(actual_rolename)); + if (!result) + return NULL; + + memset(result, 'x', NAMEDATALEN); + + sprintf(result + NAMEDATALEN, %c%u,%s, + is_superuser ? 'T' : 'F', + roleid, + actual_rolename); + + return result; + } + + const char * + show_role(void) + { + /* +* Extract the role name from the stored string; see +* assign_role +*/ + const char *value = role_string; + Oid savedoid; + char *endptr; + + Assert(strspn(value, x) == NAMEDATALEN + (value
[PATCHES] Change Ownership Permission Checks
Greetings, Attached please find a patch to change how the permissions checking for alter-owner is done. With roles there can be more than one 'owner' of an object and therefore it becomes sensible to allow specific cases of ownership change for non-superusers. The permission checks for change-owner follow the alter-rename precedent that the new owner must have permission to create the object in the schema. The roles patch previously applied did not require the role for which a database is being created to have createdb privileges, or for the role for which a schema is being created to have create privileges on the database (the role doing the creation did have to have those privileges though, of course). For 'container' type objects this seems reasonable. 'container' type objects are unlike others in a few ways, but one of the more notable differences for this case is that an owner may be specified as part of the create command. To support cleaning up the various checks, I also went ahead and modified is_member_of_role() to always return true when asked if a superuser is in a given role. This seems reasonable, won't affect what's actually seen in the various tables, and allows us to eliminate explicit superuser() checks in a number of places. I have also reviewed the other superuser() calls in src/backend/commands/ and feel pretty comfortable that they're all necessary, reasonable, and don't need to be replaced with *_ownercheck or other calls. The specific changes which have been changed, by file: aggregatecmds.c, alter-owner: alter-owner checks: User is owner of the to-be-changed object User is a member of the new owner's role New owner is permitted to create objects in the schema Superuser() requirement removed conversioncmds.c, rename: rename-checks: Changed from superuser() or same-roleId to pg_conversion_ownercheck alter-owner checks: User is owner of the to-be-changed object User is a member of the new owner's role New owner is permitted to create objects in the schema Superuser() requirement removed dbcommands.c: Moved superuser() check to have_createdb_privilege Cleaned up permissions checking in createdb and rename alter-owner checks: User is owner of the database User is a member of the new owner's role User has createdb privilege functioncmds.c: alter-owner checks: User is owner of the function User is a member of the new owner's role New owner is permitted to create objects in the schema opclasscmds.c: alter-owner checks: User is owner of the object User is a member of the new owner's role New owner has permission to create objects in the schema operatorcmds.c: alter-owner checks: User is owner of the object User is a member of the new owner's role New owner has permission to create objects in the schema schemacmds.c: Cleaned up create schema identify changing/setting/checking (This code was quite different from all the other create functions, these changes make it much more closely match createdb) alter-owner checks: User is owner of the schema User is a member of the new owner's role User has create privilege on database tablecmds.c: alter-owner checks: User is owner of the object User is a member of the new owner's role New owner has permission to create objects in the schema tablespace.c: alter-owner checks: User is owner of the tablespace User is a member of the new owner's role (No create-tablespace permission to check, tablespaces must be created by superusers and so alter-owner here really only matters if the superuser changed the tablespace owner to a non-superuser and then that non-superuser wants to change the ownership to yet another user, the other option would be to continue to force superuser-only for tablespace owner changes but I'm not sure I see the point if the superuser trusts the non-superuser enough to give them a tablespace...) typecmds.c: alter-owner checks: User is owner of the object User is a member of the new owner's role New owner has permission to create objects in the schema Many thanks. As always, comments, questions, concerns, please let me know. Thanks again, Stephen ? src/backend/commands/.typecmds.c.swp Index: src/backend/commands/aggregatecmds.c === RCS file: /projects/cvsroot/pgsql/src/backend/commands/aggregatecmds.c,v retrieving revision 1.27 diff -c -r1.27 aggregatecmds.c *** src/backend/commands/aggregatecmds.c28 Jun 2005 05:08:53 - 1.27 --- src/backend/commands/aggregatecmds.c29 Jun 2005 15:29:57 - *** *** 299,307
Re: [PATCHES] Grammer Cleanup
* Bruce Momjian (pgman@candle.pha.pa.us) wrote: I am removing this patch from the queue because without the role feature I don't think it makes sense to rename the grammer tokens. Sure. I've rolled this patch into my role tree so these changes will be included when the CREATE ROLE, etc is introduced. Thanks, Stephen Stephen Frost wrote: * Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: Ok, should I change SchemaName SavePointId back to ColId, I'd just leave them as ColId. I don't think much would be gained by introducing those productions. Done, here's the patch. Thanks, Stephen [ Attachment, skipping... ] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 signature.asc Description: Digital signature
[PATCHES] Move get_grosysid() to utils/cache/lsyscache.c
Greetings, Small patch to move get_grosysid() from catalog/aclchk.c to utils/cache/lsyscache.c where it can be used by other things. Also cleans up both get_usesysid() and get_grosysid() a bit. This is in preparation for 'Group Ownership' support. Thanks, Stephen diff -u -u -r1.107 aclchk.c --- src/backend/catalog/aclchk.c29 Aug 2004 05:06:41 - 1.107 +++ src/backend/catalog/aclchk.c29 Dec 2004 16:32:57 - @@ -1208,28 +1208,6 @@ return NULL;/* appease compiler */ } - -AclId -get_grosysid(char *groname) -{ - HeapTuple tuple; - AclId id = 0; - - tuple = SearchSysCache(GRONAME, - PointerGetDatum(groname), - 0, 0, 0); - if (HeapTupleIsValid(tuple)) - { - id = ((Form_pg_group) GETSTRUCT(tuple))-grosysid; - ReleaseSysCache(tuple); - } - else - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_OBJECT), -errmsg(group \%s\ does not exist, groname))); - return id; -} - /* * Convert group ID to name, or return NULL if group can't be found */ diff -u -u -r1.118 lsyscache.c --- src/backend/utils/cache/lsyscache.c 5 Nov 2004 19:16:14 - 1.118 +++ src/backend/utils/cache/lsyscache.c 29 Dec 2004 16:32:58 - @@ -25,6 +25,7 @@ #include catalog/pg_operator.h #include catalog/pg_proc.h #include catalog/pg_shadow.h +#include catalog/pg_group.h #include catalog/pg_statistic.h #include catalog/pg_type.h #include nodes/makefuncs.h @@ -2032,7 +2033,7 @@ AclId get_usesysid(const char *username) { - int32 result; + AclId userId; HeapTuple userTup; userTup = SearchSysCache(SHADOWNAME, @@ -2043,9 +2044,39 @@ (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg(user \%s\ does not exist, username))); - result = ((Form_pg_shadow) GETSTRUCT(userTup))-usesysid; + userId = ((Form_pg_shadow) GETSTRUCT(userTup))-usesysid; ReleaseSysCache(userTup); - return result; + return userId; +} + +/* + * get_grosysid + * + * Given a group name, look up the group's sysid. + * Raises an error if no such group (rather than returning zero, + * which might possibly be a valid grosysid). + * + */ +AclId +get_grosysid(char *groname) +{ + AclId groupId; + HeapTuple groupTup; + + groupTup = SearchSysCache(GRONAME, + PointerGetDatum(groname), + 0, 0, 0); + if (!HeapTupleIsValid(groupTup)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), +errmsg(group \%s\ does not exist, groname))); + + groupId = ((Form_pg_group) GETSTRUCT(groupTup))-grosysid; + + ReleaseSysCache(groupTup); + + return groupId; } + Index: src/include/utils/acl.h === RCS file: /projects/cvsroot/pgsql/src/include/utils/acl.h,v retrieving revision 1.75 diff -u -u -r1.75 acl.h --- src/include/utils/acl.h 29 Aug 2004 05:06:58 - 1.75 +++ src/include/utils/acl.h 29 Dec 2004 16:32:58 - @@ -245,7 +245,6 @@ * prototypes for functions in aclchk.c */ extern void ExecuteGrantStmt(GrantStmt *stmt); -extern AclId get_grosysid(char *groname); extern char *get_groname(AclId grosysid); extern AclMode pg_class_aclmask(Oid table_oid, AclId userid, Index: src/include/utils/lsyscache.h === RCS file: /projects/cvsroot/pgsql/src/include/utils/lsyscache.h,v retrieving revision 1.92 diff -u -u -r1.92 lsyscache.h --- src/include/utils/lsyscache.h 5 Nov 2004 19:16:41 - 1.92 +++ src/include/utils/lsyscache.h 29 Dec 2004 16:32:58 - @@ -115,7 +115,8 @@ Datum *values, int nvalues, float4 *numbers, int nnumbers); extern char *get_namespace_name(Oid nspid); -extern int32 get_usesysid(const char *username); +extern AclId get_usesysid(const char *username); +extern AclId get_grosysid(char *groname); #define is_array_type(typid) (get_element_type(typid) != InvalidOid) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Grammer Cleanup
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: Small patch to clean up the grammer a bit by adding 'GroupId', 'SchemaName' and 'SavePointId'. I don't particularly see the value of this --- especially since the direction of future development is likely to be to remove the distinction between user and group names, rather than make it stronger. Alright, I can redo it w/ UserId in place of GroupId everywhere. I'd like your comment on the 'pg_class changes for group ownership' on -hackers, which might play into these changes too. Previously 'GRANT' had 'ColId' instead of either, which isn't really appropriate there... Do you agree with the other changes (ColId - SchemaName, ColId - SavePointId) ? Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] Grammer Cleanup
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: Do you agree with the other changes (ColId - SchemaName, ColId -=20 SavePointId) ? I don't really see the value of them. They add some marginal documentation I suppose, but they also make the grammar bigger and slower. A more substantial objection to the practice is that it can introduce needless shift/reduce conflicts, by forcing the parser into making unnecessary decisions before it has enough context to determine what kind of name a particular token really is. Perhaps the name of 'ColId' should be changed then. At least to me that comes across as 'Column Identifier', and apparently some others thought it wasn't appropriate for user names (UserId existed and was mapped to ColId prior to my patch). Personally, I'd just like it to be consistent, when I was looking at how to add the grammar for group ownership group names were identified in one place as 'ColId' and another as 'UserId', iirc. Stephen signature.asc Description: Digital signature
Re: [PATCHES] Grammer Cleanup
* Tom Lane ([EMAIL PROTECTED]) wrote: Given other discussion, it might be best to rename it to RoleId and use that for both users and groups. Ok, should I change SchemaName SavePointId back to ColId, leave them as in the patch, change them to RoleId, or something else? Neither ColId nor RoleId seems appropriate for those.. Thanks, Stephen signature.asc Description: Digital signature
Re: [PATCHES] Grammer Cleanup
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: Ok, should I change SchemaName SavePointId back to ColId, I'd just leave them as ColId. I don't think much would be gained by introducing those productions. Done, here's the patch. Thanks, Stephen diff -u -u -r2.480 gram.y --- src/backend/parser/gram.y 8 Nov 2004 04:02:20 - 2.480 +++ src/backend/parser/gram.y 29 Dec 2004 19:04:29 - @@ -304,7 +304,7 @@ %type ival Iconst %type strSconst comment_text -%type strUserId opt_boolean ColId_or_Sconst +%type strRoleId opt_boolean ColId_or_Sconst %type list var_list var_list_or_default %type strColId ColLabel var_name type_name param_name %type node var_value zone_value @@ -570,7 +570,7 @@ */ CreateUserStmt: - CREATE USER UserId opt_with OptUserList + CREATE USER RoleId opt_with OptUserList { CreateUserStmt *n = makeNode(CreateUserStmt); n-user = $3; @@ -592,7 +592,7 @@ */ AlterUserStmt: - ALTER USER UserId opt_with OptUserList + ALTER USER RoleId opt_with OptUserList { AlterUserStmt *n = makeNode(AlterUserStmt); n-user = $3; @@ -603,7 +603,7 @@ AlterUserSetStmt: - ALTER USER UserId SET set_rest + ALTER USER RoleId SET set_rest { AlterUserSetStmt *n = makeNode(AlterUserSetStmt); n-user = $3; @@ -611,7 +611,7 @@ n-value = $5-args; $$ = (Node *)n; } - | ALTER USER UserId VariableResetStmt + | ALTER USER RoleId VariableResetStmt { AlterUserSetStmt *n = makeNode(AlterUserSetStmt); n-user = $3; @@ -691,8 +691,8 @@ } ; -user_list: user_list ',' UserId{ $$ = lappend($1, makeString($3)); } - | UserId{ $$ = list_make1(makeString($1)); } +user_list: user_list ',' RoleId{ $$ = lappend($1, makeString($3)); } + | RoleId{ $$ = list_make1(makeString($1)); } ; @@ -705,7 +705,7 @@ */ CreateGroupStmt: - CREATE GROUP_P UserId opt_with OptGroupList + CREATE GROUP_P RoleId opt_with OptGroupList { CreateGroupStmt *n = makeNode(CreateGroupStmt); n-name = $3; @@ -742,7 +742,7 @@ */ AlterGroupStmt: - ALTER GROUP_P UserId add_drop USER user_list + ALTER GROUP_P RoleId add_drop USER user_list { AlterGroupStmt *n = makeNode(AlterGroupStmt); n-name = $3; @@ -765,7 +765,7 @@ */ DropGroupStmt: - DROP GROUP_P UserId + DROP GROUP_P RoleId { DropGroupStmt *n = makeNode(DropGroupStmt); n-name = $3; @@ -781,7 +781,7 @@ */ CreateSchemaStmt: - CREATE SCHEMA OptSchemaName AUTHORIZATION UserId OptSchemaEltList + CREATE SCHEMA OptSchemaName AUTHORIZATION RoleId OptSchemaEltList { CreateSchemaStmt *n = makeNode(CreateSchemaStmt); /* One can omit the schema name or the authorization id. */ @@ -1300,8 +1300,8 @@ /* Subcommands that are for ALTER TABLE or ALTER INDEX */ alter_rel_cmd: - /* ALTER [TABLE|INDEX] name OWNER TO UserId */ - OWNER TO UserId + /* ALTER [TABLE|INDEX] name OWNER TO RoleId
Re: [PATCHES] Add error-checking to timestamp_recv
* Tom Lane ([EMAIL PROTECTED]) wrote: I said: I'll make a note to do something with this issue after the TZ patch is in. I've applied a patch to take care of this problem. Great, thanks, much appriciated. I'll test once 7.5 goes beta. Stephen signature.asc Description: Digital signature
Re: [PATCHES] Add error-checking to timestamp_recv
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: * Bruce Momjian ([EMAIL PROTECTED]) wrote: Would you show an example of the invalid value this is trying to avoid? Well, the way I discovered the problem was by sending a timestamp in double format when the server was expecting one in int64 format. Most of the time, though, this sort of error would still yield a valid value that just failed to represent the timestamp value you wanted. I'm unsure that a range check is going to help much. I'm not sure I agree about the 'most of the time' comment- I havn't done extensive tests yet but I wouldn't be at all suprised if most of the time range around the current date, when stored as a double and passed to the database which is expecting an int64, would cause the problem. The issue isn't about the wrong date being shown or anything, it's that the database accepts the value but then throws errors whenever you try to view the table. The intent of the patch was to use the exact same logic to test if the date is valid on the incoming side as is used to test the date before displaying it. This would hopefully make it impossible for someone to run into this problem in the future. It would also make it much clearer to the programmer that the date he's passing is bad and not that there's some corruption happening in the database after the date is accepted. Stephen signature.asc Description: Digital signature
Re: [PATCHES] Add error-checking to timestamp_recv
* Bruce Momjian ([EMAIL PROTECTED]) wrote: Stephen Frost wrote: -- Start of PGP signed section. * Bruce Momjian ([EMAIL PROTECTED]) wrote: Would you show an example of the invalid value this is trying to avoid? Well, the way I discovered the problem was by sending a timestamp in double format when the server was expecting one in int64 format. This is when using the binary data method for timestamps. I'll generate a small example program/schema later today and post it to the list. So you are passing the data via binary COPY or a C function? Sorry I wasn't clear, it's using libpq and binary data using an 'insert' statement. The code looks something like this: PQexec(connection,prepare addrow (timestamp) as insert into mytable values ($1)); lengths[0] = sizeof(double); formats[0] = 1; *(double*)(values[0]) = tv_sec - ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_DATE) * 86400) + (tv_sec / 100.00); PQexecPrepared(connection,addrow,1,(void*)values,lengths,formats,0); While the new code is something like: int64_t pg_timestamp; PQexec(connection,prepare addrow (timestamp) as insert into mytable values ($1)); lengths[0] = sizeof(int64_t); formats[0] = 1; pg_timestamp = ((tv_sec - ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * 86400)) * (int64_t)100) + tv_usec; *(int64_t*)(values[0]) = bswap_64(pg_timestamp); PQexecPrepared(connection,addrow,1,(void*)values,lengths,formats,0); I'll see about writing up a proper test case/schema. Looks like I'm probably most of the way there at this point, really. ;) Stephen signature.asc Description: Digital signature
Re: [PATCHES] Add error-checking to timestamp_recv
* Bruce Momjian ([EMAIL PROTECTED]) wrote: Stephen Frost wrote: I'll see about writing up a proper test case/schema. Looks like I'm probably most of the way there at this point, really. ;) I wasn't aware you could throw binary values into the timestamp fields like that. I thought you needed to use a C string for the value. Does PREPARE bypass that for some reason? I don't think so.. As I recall, I think I might have had it set up w/o using a prepare before and it was working but I'm not sure. It's certainly very useful and lets me bypass *alot* of overhead (converting to a string and then making the database convert back...). The one complaint I do have is that I don't see a way to pass a timestamp w/ an explicit timezone in binary format into a table which has a 'timestamp with timezone' field. I can pass a binary timestamp into a 'timestamp with timezone' field, but it's interpreted as UTC or the local timezone (can't remember which atm). Stephen signature.asc Description: Digital signature
Re: [PATCHES] Add error-checking to timestamp_recv
* Bruce Momjian ([EMAIL PROTECTED]) wrote: Considering all the other things the database is doing, I can't imagine that would be a measurable improvement. It makes it easier on my client program too which is listening to an ethernet interface and trying to process all of the packets coming in off of it and putting timestamps and header information into the database. The table in the database doesn't have any constraints or primary keys on it or anything, pretty much as simple as I could make it. :) The one complaint I do have is that I don't see a way to pass a timestamp w/ an explicit timezone in binary format into a table which has a 'timestamp with timezone' field. I can pass a binary timestamp into a 'timestamp with timezone' field, but it's interpreted as UTC or the local timezone (can't remember which atm). I still do not understand how this is working. It must be using our fast path as part of prepare. What language is you client code? It's just plain ol' C. It's a pretty short/simple program, really. It uses libpcap to listen to the interface, checks the type of packet (ethernet, IP, UDP/TCP, etc), copies the binary header values into the structure which it then passes to PQexecPrepared. It's kind of amazing under 2.6, you can actually calculate the delay and bandwidth pretty accurately through a network (7 'backbone' nodes, each with a backbone router, an edge router, and an access router, all in a lab) by listening on two interfaces, one on each side to calculate one-way propagation time. Stephen signature.asc Description: Digital signature
Re: [PATCHES] Add error-checking to timestamp_recv
* Bruce Momjian ([EMAIL PROTECTED]) wrote: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I wasn't aware you could throw binary values into the timestamp fields like that. I thought you needed to use a C string for the value. This facility was added in 7.4 as part of the wire-protocol overhaul. It's nothing directly to do with PREPARE; you could get the same result with no prepared statement using PQexecParams. Ah, no wonder I had not seen that before. So, I guess the issue is how much error checking do we want to have for these binary values. I was a little disturbed to hear he could insert data he couldn't later view. How many datatype have this issue? I don't think that many do.. A number of them already check incoming values where it's possible for them to not be valid. For example, 'macaddr' accepts all possible binary values, 'inet' does error checking on input. Binary timestamps were the only place I found in the work I was doing where this could happen and I managed to mess up most of the fields in one way or another before I figured it all out. :) Stephen signature.asc Description: Digital signature