Re: [PATCHES] Database owner installable modules patch
On Sat, May 10, 2008 at 11:02 AM, Bruce Momjian <[EMAIL PROTECTED]> wrote: > > Where are we on this? I haven't had time to do any work since the original patch. That patch was fairly basic - it just ran install / uninstall scripts and created catalog entries, and introduced some slightly exotic syntax to do it (INSTALL/UNINSTALL vs CREATE/DROP). The next version is intended to handle dependencies, which should make uninstallation straight forward for most cases. I was intending to revert the syntax creativity and make the commands CREATE/DROP too. I'll get a bit of time to look at both this and the enum patch this week. Cheers Tom -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Database owner installable modules patch
On Mon, Apr 7, 2008 at 7:55 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > "Tom Dunstan" <[EMAIL PROTECTED]> writes: > > OK, I found an example that does NOT fit the "just drop all > > dependencies" scenario, but that I would still like to support. I just > > had a look at the postgis pl/java support, and its install does stuff > > like "SELECT sqlj.install_jar('file://${PWD}/postgis_pljava.jar', > > 'postgis_pljava_jar', false);" and "SELECT > > sqlj.add_type_mapping('geometry', 'org.postgis.pljava.PLJGeometry');". > > There's no way we can deal with that sort of thing automatically, so > > we'll have to support uninstall scripts regardless. > > Well, that just begs the question of what those commands actually *do*. > It seems not unlikely that they'd be inserting data into tables that > would belong to the module, in which case an uninstall that dropped > the table would be fine. Those tables belong to a *different* module, though. I'm picturing three modules here: pljava, postgis, and a postgis-pljava support module that requires the first two, since it should be possible to install postgis without requiring pljava. The above stuff was from the install script of the postgis-pljava code, but inserted data into tables owned by pljava. Cheers Tom -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Database owner installable modules patch
Sorry to keep replying to myself, but part of the point of doing a patch was to force myself (and whoever else is interested to examine stuff that comes up... On Mon, Apr 7, 2008 at 11:46 AM, Tom Dunstan <[EMAIL PROTECTED]> wrote: > None of that suggests that an uninstaller script would be needed if we > understood the deps well enough, but only allowing creates for > installs seems a bit restrictive. OK, I found an example that does NOT fit the "just drop all dependencies" scenario, but that I would still like to support. I just had a look at the postgis pl/java support, and its install does stuff like "SELECT sqlj.install_jar('file://${PWD}/postgis_pljava.jar', 'postgis_pljava_jar', false);" and "SELECT sqlj.add_type_mapping('geometry', 'org.postgis.pljava.PLJGeometry');". There's no way we can deal with that sort of thing automatically, so we'll have to support uninstall scripts regardless. The question then becomes: is it worth trying to do stuff automatically if we provide a manual method anyway? I think the answer is probably yes, because having pgsql clean up automatically for the vast majority of cases is a good thing. If it's only exotic cases that need a manual uninstall script, why force one on everyone else? Cheers Tom -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Database owner installable modules patch
On Mon, Apr 7, 2008 at 11:46 AM, Tom Dunstan <[EMAIL PROTECTED]> wrote: > On Mon, Apr 7, 2008 at 3:59 AM, Gregory Stark <[EMAIL PROTECTED]> wrote: > > I wonder if there's much of a use case for any statements aside from > CREATE > > statements. If we restrict it to CREATE statements we could hack things to > > create pg_depend entries automatically. In which case we wouldn't need an > > uninstall script at all. > > The hacks to do this seem pretty dirty but on the other hand the idea of > > having modules consist of a bunch of objects rather than arbitrary SQL > > actually seems cleaner and more robust. > > It *does* seem cleaner for the examples that I've looked at. Are they > totally representative though? Not sure. It also implies a bunch more > work to create stuff, as we need to understand what's going on so as > to create those pg_depend entries. This has been bouncing around in my head a bit. I was picturing the module code itself having to understand all the CREATE statements in order to set up the dependencies... but perhaps an easier way would simply be to have the create statements themselves insert a pg_depend entry when they're done, if they detect that we're currently installing a module. There's already a flag for that that the superuser code looks at in the patch. Maybe you were ahead of me, and this was the hack that you were referring to. :) I tend to hate global flags like that because they leave weird non-obvious dependencies across the codebase, but perhaps it's the best way to do it in this case. It would mean hacking every create command in the system to understand it, though. :( Cheers Tom -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Database owner installable modules patch
On Mon, Apr 7, 2008 at 3:59 AM, Gregory Stark <[EMAIL PROTECTED]> wrote: > I wonder if there's much of a use case for any statements aside from CREATE > statements. If we restrict it to CREATE statements we could hack things to > create pg_depend entries automatically. In which case we wouldn't need an > uninstall script at all. Well, the example that got me interested in this stuff originally was trying to make pl/java easier to install. It does a bunch of CREATEs... and some GRANTs. Plus ISTM that a pretty common case might be to create a table for some reference data and then fill it with default values. Also, I just had a look at the postgis install script, which at the very least seems to update an opclass entry after creating it. None of that suggests that an uninstaller script would be needed if we understood the deps well enough, but only allowing creates for installs seems a bit restrictive. One thing that's nice about arbitrary sql for install / uninstall is that module authors can test it outside the context of doing an actual module installation - they just execute their scripts. > The hacks to do this seem pretty dirty but on the other hand the idea of > having modules consist of a bunch of objects rather than arbitrary SQL > actually seems cleaner and more robust. It *does* seem cleaner for the examples that I've looked at. Are they totally representative though? Not sure. It also implies a bunch more work to create stuff, as we need to understand what's going on so as to create those pg_depend entries. I'm receptive to the idea of uninstall simply attempting to drop anything related to the module in pg_depend in the correct order. I can't think of anything created by a module that we couldn't represent there, and it's a nice way of ensuring that an uninstall script cleans up properly. Cheers Tom -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
[PATCHES] Database owner installable modules patch
Hi all Here is a patch that provides an initial implementation of the module idea that was kicked around over the last few days. While there certainly wasn't consensus on list, enough people seemed interested in the idea of database-owner-installable modules that I thought it was worth having a play with. The general idea, to recap, is to have modules, whether included in the distribution a la contrib or installed separately, installed under a directory such as $pkglib_dir/modules/foo. A typical module directory might contain: - foo.so/foo.dll - install.sql - uninstall.sql - foo.conf - some-other-file-needed-by-foo-module.dat The module would be installed on the system, but the necessary scripts to install it in a particular database have not been run. In particular, the modules would not usually be install in template1. Database owners themselves can then opt to enable a particular installed module in their own database - they do not have to hassle a sysadmin to do it for them. Features of the patch: - A database owner can issue the command "INSTALL MODULE foo", and pgsql will look for a $pkglib_dir/modules/foo/install.sql file to run, and run it. - The install script can do pretty much anything - the user is treated as the superuser for the duration of the script. The main and obvious usage is to create C language functions required by the module. - An entry is created in a new pg_module catalog. This is mainly to guard against someone trying to install a module twice at this point, but it may have other uses in the future (see below). - "UNINSTALL MODULE foo" looks for and executes $pkglib_dir/modules/foo/uninstall.sql and cleans up the catalog. Here is a list of things that are either still to do before I'd consider it worthy of inclusion (should the general approach be considered acceptable), or which I'd like some guidance on: - Currently the script is executed in one big SPI_execute call, and so errors and NOTICEs print out the entire script as context. I'm not sure how to break it up without writing a full parser - we must have something available in the backend to break a string up into multiple statements to execute, but I'm not sure where to look. Also, is there a better way to do this than SPI? - I've hacked in a bit of an end-run around permissions checks to make the current user look like a super-user while a module script is running. Is there a better way to do this? - I can't create C language functions from dlls under the modules dir. I'd like to be able to specify 'modules/foo/foo' as the library name, but the backend sees a slash and decides that must mean the path is absolute. I see two ways to fix this: change the existing code in dfmgr.c to *really* check for absolute/relative paths rather than the current hack, or I could stick in a special case for when it starts with "modules/". I thought I'd get some guidance on-list. Do people think that sticking the dll in with other resources for the module under $pkglib_dir is a bad thing? (I think having everything in one place is a very good thing myself).Is the existing check written the way it is for a particular reason, or is it just "good enough"? - It would be nice to create the empty modules dir when we install pgsql, but while I suppose hacking a Makefile to install it is the way to go, I'm not sure which one would be appropriate. - Hack pgxs to install stuff into a modules dir if we give it some appropriate flag. - I'd like to add pg_depend entries for stuff installed by the module on the pd_module entry, so that you can't drop stuff required by the module without uninstalling the module itself. There would have to be either a function or more syntax to allow a script to do that, or some sort of module descriptor that let the backend do it itself. - Once the issue of loading a dll from inside the module's directory is done, I'd like to look for an e.g. module_install() function inside there, and execute that rather than the install.sql if found. Ditto for uninstall. - Maybe a basic mechanism to allow a module to require another one. Even just a "SELECT require_module('bar')" call at the top of a script. - It would be nice to suppress NOTICEs when installing stuff - the user almost certainly doesn't care. - Pick up config files in module directories, so that a module can install and pick up config for itself rather than getting the sysadmin to hack the global custom_variable_classes setting. - Should plperl etc be done as modules so that their config can live independently as well? And to allow modules to "require" them? Some other nice to haves for some point in the future: - Have some sort of install module privilege, rather than just a check for database ownership - Allow looking for modules under some module path for e.g. /usr/local module installs - Convert existing contrib to modules where appropriate :) - I really have no idea what happens if non-ascii characters are in an install
Re: [PATCHES] Current enums patch
Tom Lane wrote: But probably making them act like identifiers is not a good idea, because I doubt we want automatic downcasing in enum_in. People wouldn't be happy if they had to write WHERE color = '"Red"' or > something like that to get at a mixed-case enum label. Ah, that's what you had in mind. Yeah, that's a bit verbose. > Let's just throw the error instead. (I agree that enum_in can just fail with "no such label", but CREATE TYPE ought to give a specific error about it.) Agreed. Andrew, you said you had a mostly-working patch already? Cheers Tom ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Current enums patch
Tom Lane wrote: While all this reasoning is perfectly OK on its own terms, it ignores the precedents of SQL identifier handling. Maybe we should revisit the question of whether the labels are identifiers. Implying that they shouldn't be quoted like text (or should be double-quoted if required)? Originally when discussing the patch I thought that this was a good idea, but now I'm not so sure. How does one set an enum value from e.g. a JDBC PreparedStatement in that scenario? Cheers Tom ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Current enums patch
Hm, I suppose we should apply truncate_identifier rather than letting the strings be blindly truncated (perhaps in mid-character). Should we have it throw the truncation NOTICE, or not? First thought is to do so during CREATE TYPE but not during plain enum_in(). I don't see much point in truncating. The patch I have so far gives the regression output shown below (yes, I should make the messages more consistent). In fact the truncation and associated NOTICE just strike me as confusing. [snip] > + ERROR: input value too long (74) for enum: > "abcdefghijklmnopqrsatuvwxyz012345 etc etc I was about to suggest that we just truncate the value in the input function and look it up on the basis that if it's too long, the lookup will fail and we can just tell the user that it wasn't a valid value. But if there's a valid value that has the same 63 bytes as the first 63 of the too-long input string, we'll end up looking up the valid one wrongly. So we do need to test for length and die at that point. Can we just fail with the same error message as trying to input a smaller, but similarly invalid string? At create time, we should definitely throw an error... if we just truncate the value at byte 64 (with a notice or not) we might truncate in the middle of a multi-byte character. Yuk. Cheers Tom ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Current enums patch
> Looks like we need to check the length on type creation > too. > > I'll fix later if not beaten to it. It works for me (ie fails with an appropriate error) locally on Linux FC6 x86-64. Perhaps this platform initializes memory to zero on allocation? I dunno. Anyway, if you can reproduce it, please have a go, I'm at work and won't be able to do anything more in depth for some time. I could have sworn that I had some bounds checking in there at one point, but it doesn't seem to be in the latest version of the code. *shrug*. It should be both in cstring_enum() in enum.c and in DefineEnum() in typecmds.c. Cheers Tom ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Current enums patch
Tom Lane wrote: OK, I give up. :) Why? Because the whole point is that the type has to be known at parse time. Oh, duh. :) I've got it working with get_fn_expr_argtype and it seems fairly reasonable. Cool! Thanks Tom ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Current enums patch
Tom Lane wrote: The null Datum itself obviously doesn't carry that info, but the expression tree does, and there are provisions for letting functions retrieve that info --- see get_fn_expr_rettype and get_fn_expr_argtype. Hmm. I vaguely remember that there was some feeling that the PLs wouldn't always fill out the FmgrInfo struct, but perhaps that was just the case with I/O functions. ... could we have a special rule that would look for e.g. a regtype as the first parameter if the return type is generic and there are no generic parameters? I thought about that too but don't like it much. The problem is mainly that it can only work for a constant regtype parameter. OK, I give up. :) Why? Thanks Tom ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Current enums patch
text_enum: same problem as above, plus not acceptable to assume that it gets the right enum OID, plus very definitely not acceptable to assume that OID and typmod are the same size, plus I don't like the undocumented kluge in getBaseTypeAndTypmod that is pretending to supply the type OID for some small fraction of possible invocation cases. I think text_enum is probably toast. This was the ugliest part of the patch, and I mentioned it explicitly in the original submission because I was uncomfortable about it. The proper solution seemed to be to have another allowed cast function signature that would take the regtype rather than the typmod. That got just a little beyond what I wanted to do in the patch, and ugly as the getBaseTypeAndTypmod hack was, it seemed less intrusive. Another way to skirt the issue was to simply set the typmod of enum types to have their own OID, but a) I wasn't sure what other things the system might be inferring from the presence of a typmod, and b) you just stated that we can't assume that typmod is big enough to hold an OID anyway. I'm less concerned about the generic return type in this case, since the parser should know the return type when an explicit cast has been called. Hmm, ok, maybe not. Looks like it's using the return type of the cast function and throwing away the explicit cast information. That's not very nice, but can probably be fixed in the parser. Thoughts? Tom ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Current enums patch
Andrew Dunstan wrote: Tom Lane wrote: enum_first, enum_last: these return ANYENUM but violate the rule that a polymorphic-result function must have a polymorphic input argument from which the parser may deduce the actual output type. Is this a tragedy when the supplied parameter gives the result type directly? I've been having a play with this. If you create a function taking an enum type as a parameter, you can feed the output from enum_first into it, regardless of the type given to enum_first. I doubt that it would be a problem in practice, but it's certainly not great. > One rather ugly possibility is that the argument is a > value of the target type --- which you can always have as null, so > >enum_first(null::rainbow) = 'red' > > where enum_first is declared as taking and returning anyenum. I don't think that'll fly. If it's passed a null value, how does enum_first know which enum type it's dealing with? Can you get the type from the null value in some way? > This > seems a bit yucky as opposed to the regtype approach in the patch, > and yet there are cases where it would be more handy --- eg, if > you are working with a table column "col" of some enum type, you > can do enum_first(col) instead of hardwiring the enum name. That's ok, as long as nulls work. > There might be other better ways, though. Thoughts? *Ponder*. In java, you can tie a generic return value to a particular class by passing the class in as a bound generic parameter... could we have a special rule that would look for e.g. a regtype as the first parameter if the return type is generic and there are no generic parameters? As a side note, the helper functions were intended for people writing functions in plpgsql or whatever, allowing them to not hardcode the values of the enum in their function. I consider them nice-to-have rather than definitely required. If we can't come up with a nice way to do them for 8.3, I'm not absolutely wedded to them. It would be *nice*, though. I really would like the cast from text, though, but I'll deal with that in another email. Regards Tom ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Current enums patch
Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: If you want to review or test the feature, the attached patch can be used as a replacement for the portion that affects parse_coerce.c, and with this it compiles and passes regression. I think it's correct but it should still be OKed by at least one Tom. :-) > Barring objection from Tom D, I'll start with this version. OK, I've now had a chance to look at Andrew's update of the patch, which just seems to pass through the new arrayCoerce parameter to the find_coercion_pathway calls. It almost doesn't matter what gets passed in: find_coercion_pathway should never set that to true in our calls to it in the enum code, as we're passing ANYENUMOID through to the recursed call and that'll never hit the array coercion branch. In summary: looks good to me! Cheers Tom ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Current enums patch
Tom Lane wrote: > Unless someone objects, I'll change this and also revert to the > enumlabel name that seems to have been used originally (it was still > used in the docs). It seems more readable somehow (I guess it's the > lack of either ascenders or descenders in "enumname"). The name/text thing is discussed downthread. I actually started out calling the field the name and changed it to the label, but perhaps I only did that in the docs. It was probably while I was writing the docs that I realized that name could refer to the enum type name or the value name, which was confusing, but "value name" was kinda cumbersome, hence "label". Change it over with my blessing. Cheers Tom ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] Current enums patch
Hi all! Thanks for reviewing the patch! Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: Tom Lane wrote: Is there a specific reason for pg_enum.enumname to be type name and not type text? IIRC at one stage Tom wanted to try to make these identifiers, but that was quickly abandoned. This might be a hangover from that. Actually I think I see the reason: it's a bit of a pain in the neck to use the syscache mechanism with text-type lookup keys. I'm not 100% convinced that we really need to have syscaches on pg_enum, but if those stay then it's probably not worth the trouble to avoid the limitation. Yeah, that was the reason IIRC. The syscaches are used by the I/O functions: The one keyed on the pg_enum OID is for output, and the one keyed on the type OID and label, err, name, are for input. As suggested by a certain party here [1]. There didn't seem to be any text-like key types to use in the syscache except the name type, and I didn't see the 63 byte limit being a big deal, that's way bigger than any sane enum name that I've ever seen. If we ditched the second syscache, we'd want some other way to convert a type OID and name into the enum value oid efficiently. I originally suggested having a cache that got hooked onto an fn_extra field; that idea could be resurrected if you don't like the syscache. Cheers Tom 1[] http://archives.postgresql.org/pgsql-hackers/2006-08/msg01022.php ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PATCHES] Current enums patch
Hi all Here's the current version of the enums patch. Not much change from last time, the only thought-inducing stuff was fixing up some macros that changed with the VARLENA changes, and adding a regression test to do basic checking of RI behavior, after the discussions that we had recently on the ri_trigger stuff with generic types. The actual behavior was fixed by Tom's earlier patch, so this is just a sanity check. Cheers Tom enums.patch.gz Description: GNU Zip compressed data ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Enums patch v2
Neil Conway wrote: On Thu, 2007-02-01 at 22:50 -0500, Bruce Momjian wrote: Where are we on this? I can commit to reviewing this. The patch looked fairly solid on a quick glance through, but I won't have the cycles to review it properly for a week or two. I've brought this up to date with the operator family stuff now, and the attached patch once more applies cleanly against HEAD. Hopefully that'll make life a little easier when reviewing it. Thanks. I got sick of manually shifting the new OID values every time someone had added something new to the catalogs, so I moved all of my OIDs up to the 9000 range. Attached is a hacky bash script which can move them back to something sane. The idea is to run unused_oids first, see where the main block of unused OIDs starts, and then run shift_oids on the (unzipped) patch file before applying the patch. We discussed something similar a while ago, but no-one ever got around to implementing the script. I've attached the script and left the OIDs in my patch in the upper range rather than just running it myself before submitting the patch as I reckon that this might be a useful convention for authors of patches which add a non-trivial number of OIDs to the catalogs. If there's agreement then anyone can feel free to commit the script to CVS or put it on the wiki or whatever. Cheers Tom enums.patch.gz Description: GNU Zip compressed data #!/bin/sh start=$1 end=$2 new_start=$3 filename=$4 if [ -z "$filename" ] ; then echo Usage: $0 start-oid end-oid new-start-oid filename exit 1 fi if [ ! -f $filename ] ; then echo $0: $filename is not a file exit 1 fi if [ $end -le $start ] ; then echo $0: End of OID range must be greater than or equal to the start exit 1 fi start_len=`echo -n $start | wc -c` end_len=`echo -n $end | wc -c` if [ $start_len -ne 4 -o $end_len -ne 4 ] ; then echo $0: Source OID range must have 4 digits exit 1 fi let new_end=$new_start+$end-$start if [ $start -le $new_start -a $end -ge $new_start -o $new_start -le $start -a $new_end -ge $start ] ; then echo $0: OID ranges may not overlap exit 1 fi i=$start j=$new_start while [ $i -le $end ] ; do #echo $i $j sed -i "s/$i/$j/g" $filename let i=i+1 let j=j+1 done ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Enums patch v2
Peter Eisentraut wrote: An objection to enums on the ground that foreign keys can accomplish the same thing could be extended to object to any data type with a finite domain. Exactly. The extreme case is the boolean type, which could easily be represented by a two-value enum. Or, if you were feeling masochistic, a FK to a separate table. Which is easier? People regularly do stuff like having domains over finite text values, or having a FK to a separate (static) table, or using some sort of EAV. Enums are type-safe, easily ordered, relatively efficient and don't leave zillions of little static tables all over the place, a combination of attributes that none of the alternative solutions in this space present. Cheers Tom ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] Enums patch v2
Alvaro Herrera wrote: I don't, because there are always those that are knowledgeable enough to know how to reduce space lost to padding. So it would be nice to have 2-byte enums on-disk, and resolve them based on the column's typid. But then, I'm not familiar with the patch at all so I'm not sure if it's possible. Not with this patch, and AFAIK not possible generally, without writing separate I/O functions for each type. I'd love to be able to do that, but I don't think it's possible currently. The main stumbling block is the output function (and cast-to-text function), because output functions do not get provided the oid of the type that they're dealing with, for security reasons IIRC. It was never clear to me why I/O functions should ever be directly callable by a user (and hence open to security issues), but apparently it was enough to purge any that were designed like that from the system, so I wasn't going to go down that road with the patch. Cheers Tom ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Enums patch v2
Heikki Linnakangas wrote: I'm sorry I missed the original discussions, but I have to ask: Why do we want enums in core? The only potential advantage I can see over using a look-up table and FK references is performance. Well, there are a few things. Sometimes its tidiness, sometimes integrity... I've seen more than one system with hundreds of these things, and they've either gone down the table-per-enum solution, with LOTS of extra tables whose values never change, or the EAV solution, with one or two globally referenced tables to which everything in your system has as a FK, and an integrity check in a trigger if you're very lucky. Yuck on both accounts. Enums hit a sweet spot in the middle and provide data integrity and performance for non-changing values. 1. What's the point of having comparison operators for enums? For most use cases, there's no natural ordering of enum values. Well, there are a number of cases where ordering IS important, and indeed, enums provide a way to do it easily where many of the alternative solutions do not. It's one of the key benefits. 2. The comparison routine compares oids, right? If the oids wrap around when the enum values are created, the ordering isn't what the user expects. As has been pointed out by others quicker on the draw than me, I do sort the OIDs at enum creation time, for exactly this reason. 3. 4 bytes per value is wasteful if you're storing simple status codes etc. Especially if you're doing this for performance, let's do no harm by wasting space. One byte seems enough for the typical use cases. I'd even argue that having a high upper limit on the number of enum values encourages misuse of the feature. I'd really love to have these fit into a 1 or 2 byte value on disk, but AFAIK there's simply no way to do it currently in postgresql. If we ever move to a solution where on-disk representation is noticeably different from in-memory representation, then it might be doable. If that does happen, we might benefit from other improvements such as being able to order columns in a tuple on disk so as to minimize alignment padding, not having to store a composite type's oid, etc. Until that happens, though, if it ever does, this is probably the tightest on-disk representation we're likely to get, unless we're happy to impose some pretty severe restrictions, like 8 bits per enum, and only 256 enums in total (for a 2 byte total). I was already shot down trying to make similar restrictions when I first brought it up. :) The OID solution seems to offend the least. :) We did discuss this somewhat earlier, and I'm happy to take alternative suggestions, but AFAIK this is about as good as it's going to get right now. Cheers Tom ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PATCHES] Enums patch v2
Hi all Here is an updated version of the enums patch. It has been brought up to date and applies against current CVS HEAD. The original email is at [1], and describes the implementation. This version contains sgml documentation, and contains the missing bounds checks and error codes noted in the last email. As mentioned before, the only part that I'm not super keen on is the hack required to pass the type oid in to the text-to-enum cast function, since normally those take type mods but not type oids. I wasn't sure how else to get a cast working though, short of allowing another type of cast function which accepts type oids as well. That seemed overkill for just one case, though, and was getting a bit beyond the realms of what I wanted to get done with this patch. Anyway, it's reasonably feature complete and should be appropriately documented now, so feedback gratefully accepted. Cheers Tom [1] http://archives.postgresql.org/pgsql-patches/2006-09/msg0.php enums.patch.gz Description: GNU Zip compressed data ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PATCHES] Enums patch v1
Hi folks Here's a first cut of the enums patch for feedback when people have time. It follows an anyenum pseudo-type approach as foreseen by Nostradamus in one of the original threads. (http://archives.postgresql.org/pgsql-hackers/2005-11/msg00457.php). That made the patch a little more intrusive than I had been hoping for, but hopefully the bits that touch core files like parse_coerce.c are easy to follow. The patch is largely feature complete as far as what I wanted to achieve, although it's currently missing documentation patches. I'll add those at some point soonish. It's currently only been tested on my x86 FC5 box. A few notes: - In much of the code, anyenum is treated identically to anyelement, except when trying to tie down generic parameters: at this point types which aren't enums will be rejected if the declared type is anyenum. Thus there are quite a few places in the patch where ANYENUMOID has just kinda been added in next to ANYELEMENTOID. - The error messages in enum.c need some love. Apparently there's a document that I need to read in the sgml docco which explains error codes to me; currently I've got FIXME TOM comments in there. - One difficulty that I had was trying to write a cast-text-to-enum function. While input functions pass their type oid as a second parameter, all that cast functions can get is the typmod. I did a bit of an ugly hack to pass the enum oid in as an int in that parameter if it's an enum type; it might be cleaner to allow cast functions that take an oid there rather than an int and pass the correct parameter depending on that type. I wasn't comfortable making a change like that in this patch without discussion, though. - It appeared that to be cached in a syscache properly, the enum names had to be a Name/NameData thing rather than a text or whatever, so there's a limit of 64 characters per enum value. 64 characters should be enough for anybody :). Hmm, as I write this I realize that I should probably add some length checking to the create type function. And something prohibiting commas, since they're the array delimiter. Consider that on the TODO as well. :) Do we want to allow a customizable delimiter? IMO someone putting commas inside an enum value is doing something sick and twisted anyway, but maybe there's an argument to be made. They could always hack the delimiter on the array type, although that wouldn't survive a dump. - You can't create generic anyenum functions in the PL's except for plpgsql. You can create functions taking/returning the concrete type, but not anyenum. I don't consider that a huge loss; we couldn't really think of a use case. I haven't added a regression test involving enums to e.g. plperl, although I have tested it with perlpython/tcl. Is that worth doing, or is it being paranoid? - This patch requires an initdb, but I haven't bumped the catalog version number in the patch since I don't know when (if ever) it will be applied Now a few questions from a first-time contributor: - Do I need to call CatalogUpdateIndexes() after I delete rows when dropping the type? Following the code for creating dropping standard user types, it's called on creation but not deletion. - I'm a little unclear about when pfree() should be called. My understanding is that it will deallocate memory from the current memory context, but that context will be chucked once the current statement or whatever is dead. Is it just nice to do when we know that we're done with a particular chunk and to stop the context from growing, or are there other occasions? Or have I misunderstood what it does? - Is pg_indent intended to be run by us mortals? I tried following the instructions in the README to make sure my indentation was kosher, but ran into issues with pgindent complaining about something called detab being missing. It's not on my Fedora box, and some quick Googling didn't unearth an obvious candidate, although a few dodgy looking scripts popped up. Is it a BSD tool? Anyway, all comments / criticisms welcome. Have a look at the regression test to see what should be working. Cheers Tom enums-v1.patch.gz Description: GNU Zip compressed data ---(end of broadcast)--- TIP 6: explain analyze is your friend