[HACKERS] Review: Extensions Patch

David E. Wheeler Fri, 03 Dec 2010 17:09:52 -0800

Extensions Patch v15 Review
===========================

Submission review
-----------------


* Is the patch in context diff format?

Yes.

* Does it apply cleanly to the current git master?

Yes.

* Does it include reasonable tests, necessary doc patches, etc?

`make check` passes.
`make installcheck` fails (regression.diffs attached).
`make -C contrib installcheck` passes

I see tests for pg_execute_sql_string() and replace(), but absolutely nothing 
else. Such an important core feature really should have a pretty comprehensive 
suite of tests exercising all the functionality. Relying on the contrib 
extension tests won't do it, as they exercise only a subset of the 
functionality (e.g., no custom_variable_classes), and then only as a 
side-effect of their actual purpose.

Details on docs below.

Usability review
----------------

Read what the patch is supposed to do, and consider:

* Does the patch actually implement that? 

Yes.

* Do we want that? 

OH YES!

* Do we already have it? 

Nope.

* Does it follow SQL spec, or the community-agreed behavior? 

It's an implementation of community-agreed behavior, as hashed out on the mail 
list over the years.

* Does it include pg_dump support (if applicable)?

Yes, it does.

* Are there dangers? 

Yes. Much of the execution is superuser-only, so we need to make sure that such 
is actually the case (unit tests would help with that).

* Have all the bases been covered?

Mostly, yes. The lack of tests is the single biggest drawback to this patch.

Feature test
------------

Apply the patch, compile it and test:

* Does the feature work as advertised?

Yes.

* Are there corner cases the author has failed to consider?

I had some trouble getting a third-party extension to work. See the end of this 
document for details.

* Are there any assertion failures or crashes?

No.

Performance review
------------------

* Does the patch slow down simple tests? 

Not that I've noticed.

* If it claims to improve performance, does it?

N/A.

* Does it slow down other things?

Not that I've noticed.

Coding review
-------------

Read the changes to the code in detail and consider:

* Does it follow the project 
[http://developer.postgresql.org/pgdocs/postgres/source.html coding 
guidelines]? 

I'm not a C expert, but it looks like it matches quite closely with all the 
other code. It's very neat and well-commented.

* Are there portability issues? 
* Will it work on Windows/BSD etc? 

I can't comment on these.

* Are the comments sufficient and accurate?

Yes.

* Does it do what it says, correctly?

I can't comment on this by looking at the code; see detailed review from a 
user's perspective below.

* Does it produce compiler warnings?

No.

* Can you make it crash?

No.

Architecture review
-------------------

* Is everything done in a way that fits together coherently with other 
features/modules? 
* Are there interdependencies that can cause problems?

Can't really comment on this.

Notes on the documentation
--------------------------

The pg_extension catalog docs are a little thin, but probably sufficient. They 
appear to be duplicated, though, with the entries for catalog-pg-extension and 
view-pg-extensions looking identical. One is apparently a list of installed 
extensions, and the other an SRF that lists installed and available extensions. 
But I find the duplication a bit confusing. (Might be easer if I wasn't reading 
SGML though.)

The extend-extension docs

I don't understand this paragraph at all:

    <para>
     The way to put together a coherent set of custom <acronym>SQL</> objects
     is to create an <literal>extension</literal>. There are two sides of the
     extension, the Operating System side contains several files and
     the <acronym>SQL</> one uses them.
    </para>

This paragraph isn't very clear, either:

       <para>
        The control file can also contain
        a <literal>custom_variable_classes</literal> key followed by any
        number of custom settings,
        named <literal><replaceable 
class="parameter">class</replaceable>.varname</literal>.
        The <literal>custom_variable_classes</literal> value takes effect
        at <command>CREATE EXTENSION</command> time, and is persistent. The
        custom settings are set automatically too, but are not persistent.

Examples might help.


Control file keys:

* comment -- Should mention that it's used for CREATE COMMENT ON EXTENSION, no?
* version -- If it's free-form, how will you do dependency-checking? Or will 
you?
* script
* encoding -- Seems unnecessary; one can explicitly set it in the .sql file, no?
* custom_variable_classes

I don't understand this paragraph:

     <para>
      The admin
      function <link linkend="functions-extension">pg_extension_flag_dump</link>
      can be used to revert the default <literal>pg_dump</literal> policy
      about objects that belong to an extension and force the flagged objects
      to be part of the backups.
     </para>

What's a pg_dump policy?

* There appears to be an irrelevant change to the docs for replace().
* For the pg_read_binary_file() docs, should there not be some sort of note 
about permissions to the file to be read in?

With pg_execute_sql_string() and friends, what if there aren't the same number 
of parameters as there are placeholders? Should be a note about that.

Also, why are the placholders different than everywhere else? I see

+ SELECT pg_execute_sql_string('CREATE SCHEMA @schema@;', '@schema@', 'utils');

and

+ SELECT pg_execute_sql_string('CREATE SCHEMA s;', 's', 'bar');

Why not $1 like is used in PREPARE and user-defined functions? I think the 
second example is particularly weird, as there's nothing to indicate that "s" 
is a placeholder. I don't mind "@schema@", but would like to see uses of such 
things be consistent throughout PostgreSQL (and I expect that one cannot add 
such as placeholders to PREPARE).

In the CREATE EXTENSION docs, I don't understand this section:

      <varlistentry>
       <term>NO USER DATA</term>
       <listitem>
        <para>
         This option is used by <xref linkend="APP-PGDUMP">. As it's possible
         to flag extension's objects so that they get dumped
         (see <xref linkend="functions-extension">), this option allow
         extension authors to prepare installation scripts that will avoid
         creating user data when restoring.
        </para>
       </listitem>
      </varlistentry>

Does it mean that no user data will be dumped when the extension is dumped?

In the DROP EXTENSION docs, I think there's a pasto:

   <refname>DROP EXTENSION</refname>
   <refpurpose>remove a tablespace</refpurpose>

Another one here:

    <varlistentry>
     <term><literal>CASCADE</literal></term>
     <listitem>
      <para>
       Automatically drop objects that depend on the index.
      </para>
     </listitem>
    </varlistentry>

s/index/extension/. Same in the RESTRICT section.

In xfunc.sgml, I see:

       <term><varname>EXTENSION</varname></term>
       <listitem>
        <para>
         extension control files to install
         into 
<literal><replaceable>prefix</replaceable>/share/$MODULEDIR</literal>,
         derived from
         the <literal><replaceable>extension</replaceable>.control.in</literal>
         file to add in your sources. <literal>EXTENSION</literal> can also
         be a list, you then have to provide a <literal>.control.in</literal>
         file per extension in this list.

Why would multiple control files be necessary? And how would it map to 
EXTVERSION? Would all the extensions get the one version in EXTVERSION?

In func.sgml:

        <literal><function>pg_extension_flag_dump(<parameter>objid</> 
<type>oid</>)</function></literal>
       </entry>
       <entry><type>bool</type></entry>
       <entry>Flag an extension's <literal>objectid</literal> as worthy
       of <literal>pg_dump</literal>.</entry>

Why is this necessary? Aren't all database objects "worthy" of being dumped? 
And this:

       <entry>
        <literal><function>pg_extension_with_user_data()</function></literal>
       </entry>
       <entry><type>bool</type></entry>
       <entry>Return <literal>true</literal> only when the extension's
       script should care for user data.</entry>

What constitutes "user data" when it comes to extensions? In contrast, the 
explanation below of "pg_extension_flag_dump" is very clear, making it obvious 
what it's for and why it might be useful. I don't see how it relates to the 
summary above, however. OTOH, it also says:

   <para>
    This function can only be called when loading a <acronym>SQL</> script
    using the command <command>CREATE
    EXTENSION</command>. See <xref linkend="sql-createextension">
    and <xref linkend="extend-extension">.
   </para>

And I have no idea how I'd call that function within another function call. 
Does this mean that only the extension author can call it? Seems like a weird 
scoping without precedent.

Hrm. Looking at extension.c, I see:

+       if (!create_extension)
+               ereport(ERROR,
+                               (errmsg("This function can only be used from 
CREATE EXTENSION.")));

I think that's a confusing error message, frankly. Perhaps better would be 
something like "This function can only be called from SQL files executed by 
CREATE EXTENSION". The docs should make that clearer, too. It was only when I 
saw this error message that I realized how it's scoped.

Now, some of the information in the [wiki 
page](http://wiki.postgresql.org/wiki/Extensions) actually seems more 
informative. For example, the description and examples of \dx, \dx+, and 
pg_extension_objects() are *much* better. Any way to work those into the docs? 
Maybe they don't go into the function reference pages, but perhaps into the 
"Putting it all together" doc? In fact, I think the wiki doc should be copied 
into the "putting it all together" doc almost verbatim (modulo the patch 
excerpts). That will be a big help to prospective extension authors.

Speaking of pg_extension_objects(), it'd be nice if its output identified the 
extension to which each object belongs. Oh, I see, you can ask for only the 
objects in a single extension. That's okay then. But I don't think you need to 
document all the OUT paramters, just the required single argument specifying 
the extension name.

Reviewing the wiki page some more, you have this example (also in the docs, 
IIRC):

+ DO $$
+ BEGIN
+ IF pg_extension_with_user_data() THEN
+   create schema foo;
+   create table foo.bar(id serial primary key);
+   perform pg_extension_flag_dump('foo.bar_id_seq'::regclass);
+   perform pg_extension_flag_dump('foo.bar::regclass);
+ END IF;
+ END;
+ $$;

That's good, but what if someone has restored a database that has those objects 
already? Will the extension be re-created with this code before the tables are 
loaded? Or vice versa? Either way, you're going to have a conflict when you 
restore from a dump.

Code Review
-----------

I find the use of variables in the control files confusing. It looks like this:

    version = 'EXTVERSION'

Why not use make syntax?

    version = $(EXTVERSION)

That makes it much clearer that it's a variable and provides the same syntax a 
the extension developer will already be using in the Makefile.

I've already mentioned that the @extschema@ placeholder is different from any 
other SQL placeholder:

    SET search_path = @extschema@;

I'm not sure of a better option, but I do think it should be made consistent, 
if at all possible, since this is executed in the database, not at `make` time.

Speaking of which, what happens if an extension has no search_path? Or an old 
explicit one?

    SET search_path = public;

Hrm. I'm wondering if it might make more sense to ignore such statements when 
sometning is executed by `pg_execute_sql_file()`? IOW, if I do

    CREATE EXTENSION foo WITH SCHEMA bar;

That the schema would be installed in bar no matter what search_path is set in 
the file loaded from the file system. I think that this would make CREATE 
EXTENSION behave more as expected, *and* allow existing extension distributions 
to just work (modulo the control file). No need for placeholders in the .sql 
file anymore, either. Maybe there's some optoin to CREATE EXTENSION that would 
*not* disable `SET search_path`.

Speaking of the contol file, I think that its inclusion in extension 
distributions should be completely optional. Of the keys supported:

* comment
* version
* script
* encoding
* custom_variable_classes

All except the last one could be written into the Makefile, and then `make` 
would generate the control file for installation. Only if the extension author 
wants to support custom_variable_classes would you need to create an explicit 
control file. And even that could probably be finessed with some sort of Make 
variable.

I'd like to see the control file be something that the extension author doesn't 
have to deal with, as much as possible. It just feels superfluous.

BTW, you might want to put the changes to the existing contrib modules in a 
separate patch, just to simplify things a bit. Not a big deal at this point, 
but it might help the committer to have further separation. But reviewing that 
bit, I was wondering, how are the uninstall scripts handled? Are they necessary 
anymore?

### Questions ###

* I assume that if an extension is installed that includes a DSO that the 
server will need to be HUPed before you can CREATE EXTENSION, yes?

* I trust that CREATE EXTENSION and DROP EXTENSION and ALTER EXTENSION are 
transactional, yes?

* Is there a regtype for extensions?

* Why can only super users get a list of extensions? It might be useful for me 
to be able to check from within a udf to see if a given extension is installed, 
for example. I can see why you wouldn't want to allow listing of extensions 
that are installed on the system but not in the current database.

* Is there no way to keep a list of available extensions somewhere in the 
cluster? Seems kind of silly to parse all the control files from within 
pg_extensions(). That can make a call to pg_extensions() pretty expensive.

* What's with `replace_text_variadic(PG_FUNCTION_ARGS)`? Related?

* \dx+ is not documented in \? in psql. I think you just need to add "[+]".

* `doc/src/sgml/ref/psql-ref.sgml` needs updating too.

* I like that I can see a list of installed and available extensions, but I'm 
not sure that \dx+ is the right command for the latter. Usually + means "more 
columns to provide more information for the same objects as you saw without the 
+". Is there any precedent for showing different objects with + than without?

Exercising the Feature
----------------------

* I built my cluster with all the contrib extensions. \dx+ does a nice job of 
listing what's available. So I ran `CREATE EXTENSION hstore`. It seemed to work 
great, but was *very* verbose. Every SQL statement was emitted to the terminal. 
I thought \echo was turned off by CREATE EXTENSION. I was able to use hstore 
after that; worked great. Yay!

* Wanted to try the schema support, so next did

      CREATE SCHEMA ex;
      CREATE EXTENSION citext WITH SCHEMA ex;

  This was not verbose the way hstore was. And now it's nicely installed:
 
      postgres=# \dx
                               List of extensions
       Schema │  Name  │ Version  │              Description               
      ────────┼────────┼──────────┼────────────────────────────────────────
       ex     │ citext │ 9.1devel │ case-insensitive character string type
       public │ hstore │ 9.1devel │ storing sets of key/value pairs
      (2 rows)

  Very nice. And I was able to use it with `create table users (nickname 
ex.citext primary key);` Awesome.

* Next I tried dropping it. `DROP EXTENSION ex.citext;` did not work. What if I 
have citext installed in two schemas and just want to drop one of them? Ah, I 
see, one cannot have an extension with the same name in different schemas. 
Currently the schema-qualification is a syntax error, but maybe it should be a 
little more informative?

* Without the schema-qualification I get an error because it's in use. But 
`CASCADE`  works. Great.

* The `pg_extension` catalog and `pg_extensions` views are both useful, but too 
close in name. I found it confusing how similar they are. It seems to me like 
the `pg_extension` class is well-named and like the other catalog table names, 
but `pg_extenions` should perhaps be `pg_available_extensions` or something.

Creating a Third Party Extension
--------------------------------

I have a very simple extension I wrote more or less as an example extension for 
PGXN. It's a key/value pair data type called 
[pair](https://github.com/theory/kv-pair). As an experiment, to see how it 
would feel to use extensions as an extension developer, I created [a 
branch](https://github.com/theory/kv-pair/tree/9.1-extension) to port it. With 
that change, I gave it a try:

    export PATH=/usr/local/pgsql-devel/bin:$PATH
    make
    sudo make install
    make installcheck PGUSER=postgres

The test failed with this error:

    ! CREATE EXTENSION pair;
    ! NOTICE:  Installing extension 'pair' from 
'/usr/local/pgsql-devel/share/contrib/pair.sql
    ! ERROR:  could not execute sql file: 
'/usr/local/pgsql-devel/share/contrib/pair.sql'

The file is there and looks fine. I also just tried to install it myself:

     > /usr/local/pgsql-devel/bin/psql -U postgres
    psql (9.1devel)
    Type "help" for help.

    postgres=# create extension pair;
    NOTICE:  Installing extension 'pair' from 
'/usr/local/pgsql-devel/share/contrib/pair.sql', in schema public, with user 
data
    ERROR:  could not execute sql file: 
'/usr/local/pgsql-devel/share/contrib/pair.sql'

It would be nice if it gave me more information. What failed, exactly? It also 
fails when I use `WITH SCHEMA public`.

I assume this will be a simple issue. Overall I'm happy with how easy it will 
be to update existing extensions to use the new functionality, but as I've 
mentioned above, I do think that it would be better if:

* One could specify stuff *only* in the `Makefile` and let `make` create a 
control file.

* I could omit the @extschema@ business altogether and just let CREATE 
EXTENSION set the path for the duration of its execution.

With these two changes, it would be *even easier* for existing third-party 
extensions to be adapted.

Conclusion
----------

Overall I'm very happy with this patch. There are some nits I've brought up 
here that need to be addressed, and I do think there are some design changes 
that would be worth considering. So I'm marking it as returned. But I also 
think that someone more familiar with the core code should do a proper code 
review during this commitfest (I'm mostly about functionality and interface).

But I think it's close, and I can't wait to have this stuff solidly in core.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Review: Extensions Patch

Reply via email to