Re: extensions are hitting the ceiling

2019-04-21 Thread Noah Misch
On Tue, Apr 16, 2019 at 04:24:20AM -0500, Eric Hanson wrote:
> On Tue, Apr 16, 2019 at 12:47 AM Noah Misch  wrote:
> > https://www.postgresql.org/message-id/20180710014308.ga805...@rfd.leadboat.com
> >
> > The @DEPNAME_schema@ thing was trivial to implement, but I shelved it.
> > I'm attaching the proof of concept, for your information.

> Why shelved?  I like it.  You said you lean toward 2b in the link above,
> but there is no 2b :-) but 1b was this option, which maybe you meant?

(2) is a mutation of (1), so (2b) exists by mutating (1b) according to the
description of (2).  In other words, (2b) would be this:

  Drop relocatable=true from extensions that have cause to do so (by adding a
  new version number and versioned control file): cube, earthdistance,
  pageinspect, pg_freespacemap, xml2.  Do likewise for others as needed in the
  future.  To relocate an affected extension, drop and recreate it.  Warn
  about relocatable=true in non-core extensions.  Expand @DEPNAME_schema@ in
  extension SQL files.  Use @cube_schema@ to refer to the right objects.

I shelved it because thread
http://postgr.es/m/flat/20180830070609.ga1485...@rfd.leadboat.com did not
accept it as a solution for contrib/ extensions.  If it's not good enough for
contrib/, it's not good enough for this problem space.

> The other approach would be to have each extension be in it's own schema,
> whose name is fixed for life.  Then there are no collisions and no
> ambiguity about their location.   I don't use NPM but was just reading
> about how they converted their package namespace from a single global
> namespace with I think it was 30k packages in it,
> to @organization/packagename.  I don't know how folks would feel about a
> central namespace registry, I don't love the idea if we can find a way
> around it, but would settle for it if there's no better solution.  Either
> that or use a UUID as the schema name.  Truly hideous.  But it seems like
> your approach above with just dynamically looking up the extension's schema
> as a variable would solve everything.

That's like how C/C++/Java identifiers work, turning each @DEPNAME_schema@
into a constant.  If we were starting from scratch, that's attractive.
Unfortunately, folks have applications that expect to use e.g. public.earth().
We'd need a big benefit to justify obligating those users to migrate.  If we
had @DEPNAME_schema@, communities of users could decide to adopt a local
convention of a fixed schema per extension.  Other communities of users,
particularly those with substantial stable code, could retain their current
schema usage patterns.

Thanks,
nm




Re: extensions are hitting the ceiling

2019-04-17 Thread Jiří Fejfar
Hi all!

I am sending our comments to mentioned issues. I was trying to send it
month ago 
(https://www.postgresql.org/message-id/CA%2B8wVNUOt2Bh4x7YQEVoq5BfP%3DjM-F6cDYKxJiTODG_VCGhUVQ%40mail.gmail.com),
but it somehow doesn't append in the "thread" (sorry, I am new in
mailing list practice...).

My colleague already posted some report to bug mailing list
(https://www.postgresql.org/message-id/15616-260dc9cb3bec7...@postgresql.org)
but with no response.

On Tue, 19 Mar 2019 at 02:38, Eric Hanson  wrote:
>
> Hi folks,
>
> After months and years of really trying to make EXTENSIONs meet the 
> requirements of my machinations, I have come to the conclusion that either a) 
> I am missing something or b) they are architecturally flawed.  Or possibly 
> both.
>
> Admittedly, I might be trying to push extensions beyond what the great 
> elephant in the sky ever intended. The general bent here is to try to achieve 
> a level of modular reusable components similar to those in "traditional" 
> programming environments like pip, gem, npm, cpan, etc. Personally, I am 
> trying to migrate as much of my dev stack as possible away from the 
> filesystem and into the database. Files, especially code, configuration, 
> templates, permissions, manifests and other development files, would be much 
> happier in a database where they have constraints and an information model 
> and can be queried!
>
> Regardless, it would be really great to be able to install an extension, and 
> have it cascade down to multiple other extensions, which in turn cascade down 
> to more, and have everything just work. Clearly, this was considered in the 
> extension architecture, but I'm running into some problems making it a 
> reality.  So here they are.
>
>
> #1: Dependencies
>
> Let's say we have two extensions, A and B, both of which depend on a third 
> extension C, let's just say C is hstore.  A and B are written by different 
> developers, and both contain in their .control file the line
>
> requires = 'hstore'
>
> When A is installed, if A creates a schema, it puts hstore in that schema. If 
> not, hstore is already installed, it uses it in that location.  How does the 
> extension know where to reference hstore?
>
> Then, when B is installed, it checks to see if extension hstore is installed, 
> sees that it is, and moves on.  What if it expects it in a different place 
> than A does? The hstore extension can only be installed once, in a single 
> schema, but if multiple extensions depend on it and look for it in different 
> places, they are incompatible.
>
> I have heard talk of a way to write extensions so that they dynamically 
> reference the schema of their dependencies, but sure don't know how that 
> would work if it's possible.  The @extschema@ variable references the 
> *current* extension's schema, but not there is no dynamic variable to 
> reference the schema of a dependency.
>
> Also it is possible in theory to dynamically set search_path to contain every 
> schema of every dependency in play and then just not specify a schema when 
> you use something in a dependency.  But this ANDs together all the scopes of 
> all the dependencies of an extension, introducing potential for collisions, 
> and is generally kind of clunky.
>

It is not possible to specify the version of extension we are
dependent on in .control file.

> #2:  Data in Extensions
>
> Extensions that are just a collection of functions and types seem to be the 
> norm.  Extensions can contain what the docs call "configuration" data, but 
> rows are really second class citizens:  They aren't tracked with 
> pg_catalog.pg_depend, they aren't deleted when the extension is dropped, etc.
>
> Sometimes it would make sense for an extension to contain *only* data, or 
> insert some rows in a table that the extension doesn't "own", but has as a 
> dependency.  For example, a "webserver" extension might contain a "resource" 
> table that serves up the content of resources in the table at a specified 
> path. But then, another extension, say an app, might want to just list the 
> webserver extension as a dependency, and insert a few resource rows into it.  
> This is really from what I can tell beyond the scope of what extensions are 
> capable of.
>

I am not sure about the name "Configuration" Tables. From my point of
view extensions can hold two sorts of data:
1) "static" data: delivered with extension, inserted by update
scripts; the same "static" data are present across multiple
installation of extension in the same version. This data are not
supposed to be dumped.
2) "dynamic" data: inserted by users, have to be included in dumps,
are marked with pg_extension_config_dump and are called
"configuration" tables/data ... but why "configuration"?

>
> #3 pg_dump and Extensions
>
> Tables created by extensions are skipped by pg_dump unless they are flagged 
> at create time with:
>
> pg_catalog.pg_extension_config_dump('my_table', 'where id < 20')
>
> 

Re: extensions are hitting the ceiling

2019-04-16 Thread Eric Hanson
On Tue, Apr 16, 2019 at 4:47 AM Eric Hanson  wrote:

> We would probably be wise to learn from what has gone (so I hear) terribly
> wrong with the Node / NPM packaging system (and I'm sure many before it),
> namely versioning.  What happens when two extensions require different
> versions of the same extension?  At a glance it almost seems unsolvable,
> given the constraint that an extension can only be installed once, and only
> at a single version.  I don't understand why that constraint exists though.
>

How about this:

1. Extension can be installed once *per-version*.
2. Each version of an extension that is installed is assigned by the system
a unique, hidden schema (similar to temp table schemas) whose name doesn't
matter because the extension user will never need to know it.
3. There exists a dynamic variable, like you proposed above, but it
includes version number as well.  @DEPNAME_VERSION_schema@ perhaps.  This
variable would resolve to the system-assigned schema name of the extension
specified, at the version specified.
4. Since sprinkling ones code with version numbers is awful, there exists a
way (which I haven't thought of) to set a kind of search_path-type setting
which sets in the current scope the version number of the extension that
should be dereferenced, so developers can still use @DEPNAME_schema@.

This would allow multiple versions of extensions to coexist, and it would
solve the problem with two extensions wanting the same dependency in
different places.

It's radical, but extensions are radically broken.  A critique of the above
would be that extensions still have a single global namespace, so
personally I don't think it even goes far enough.

Cheers,
Eric


Re: extensions are hitting the ceiling

2019-04-16 Thread Eric Hanson
On Tue, Apr 16, 2019 at 4:24 AM Eric Hanson  wrote:

>
>
> On Tue, Apr 16, 2019 at 12:47 AM Noah Misch  wrote:
>
>> On Mon, Mar 18, 2019 at 09:38:19PM -0500, Eric Hanson wrote:
>> > I have heard talk of a way to write extensions so that they dynamically
>> > reference the schema of their dependencies, but sure don't know how that
>> > would work if it's possible.  The @extschema@ variable references the
>> > *current* extension's schema, but not there is no dynamic variable to
>> > reference the schema of a dependency.
>>
>> If desperate, you can do it like this:
>>
>>   DO $$ BEGIN EXECUTE format('SELECT %I.earth()',
>> (SELECT nspname FROM pg_namespace n
>>  JOIN pg_extension ON n.oid = extnamespace
>>  WHERE extname = 'earthdistance' )); END $$;
>>
>> Needless to say, that's too ugly.  Though probably unimportant in
>> practice, it
>> also has a race condition vs. ALTER EXTENSION SET SCHEMA.
>>
>> > Also it is possible in theory to dynamically set search_path to contain
>> > every schema of every dependency in play and then just not specify a
>> schema
>> > when you use something in a dependency.  But this ANDs together all the
>> > scopes of all the dependencies of an extension, introducing potential
>> for
>> > collisions, and is generally kind of clunky.
>>
>> That's how it works today, and it has the problems you describe.  I
>> discussed
>> some solution candidates here:
>>
>> https://www.postgresql.org/message-id/20180710014308.ga805...@rfd.leadboat.com
>>
>> The @DEPNAME_schema@ thing was trivial to implement, but I shelved it.
>> I'm
>> attaching the proof of concept, for your information.
>>
>
> Interesting.
>
> Why shelved?  I like it.  You said you lean toward 2b in the link above,
> but there is no 2b :-) but 1b was this option, which maybe you meant?
>
> The other approach would be to have each extension be in it's own schema,
> whose name is fixed for life.  Then there are no collisions and no
> ambiguity about their location.   I don't use NPM but was just reading
> about how they converted their package namespace from a single global
> namespace with I think it was 30k packages in it,
> to @organization/packagename.  I don't know how folks would feel about a
> central namespace registry, I don't love the idea if we can find a way
> around it, but would settle for it if there's no better solution.  Either
> that or use a UUID as the schema name.  Truly hideous.  But it seems like
> your approach above with just dynamically looking up the extension's schema
> as a variable would solve everything.
>
> There is the problem of sequencing, where extension A installs dependency
> extension B in it's own schema.  Then extension C also wants to use
> dependency B, but extension A is uninstalled and extension B is now still
> hanging around in A's old schema.  Not ideal but at least everything would
> still function.
>
> I'll keep thinking about it...
>

We would probably be wise to learn from what has gone (so I hear) terribly
wrong with the Node / NPM packaging system (and I'm sure many before it),
namely versioning.  What happens when two extensions require different
versions of the same extension?  At a glance it almost seems unsolvable,
given the constraint that an extension can only be installed once, and only
at a single version.  I don't understand why that constraint exists though.

Eric


Re: extensions are hitting the ceiling

2019-04-16 Thread Eric Hanson
On Tue, Apr 16, 2019 at 12:47 AM Noah Misch  wrote:

> On Mon, Mar 18, 2019 at 09:38:19PM -0500, Eric Hanson wrote:
> > I have heard talk of a way to write extensions so that they dynamically
> > reference the schema of their dependencies, but sure don't know how that
> > would work if it's possible.  The @extschema@ variable references the
> > *current* extension's schema, but not there is no dynamic variable to
> > reference the schema of a dependency.
>
> If desperate, you can do it like this:
>
>   DO $$ BEGIN EXECUTE format('SELECT %I.earth()',
> (SELECT nspname FROM pg_namespace n
>  JOIN pg_extension ON n.oid = extnamespace
>  WHERE extname = 'earthdistance' )); END $$;
>
> Needless to say, that's too ugly.  Though probably unimportant in
> practice, it
> also has a race condition vs. ALTER EXTENSION SET SCHEMA.
>
> > Also it is possible in theory to dynamically set search_path to contain
> > every schema of every dependency in play and then just not specify a
> schema
> > when you use something in a dependency.  But this ANDs together all the
> > scopes of all the dependencies of an extension, introducing potential for
> > collisions, and is generally kind of clunky.
>
> That's how it works today, and it has the problems you describe.  I
> discussed
> some solution candidates here:
>
> https://www.postgresql.org/message-id/20180710014308.ga805...@rfd.leadboat.com
>
> The @DEPNAME_schema@ thing was trivial to implement, but I shelved it.
> I'm
> attaching the proof of concept, for your information.
>

Interesting.

Why shelved?  I like it.  You said you lean toward 2b in the link above,
but there is no 2b :-) but 1b was this option, which maybe you meant?

The other approach would be to have each extension be in it's own schema,
whose name is fixed for life.  Then there are no collisions and no
ambiguity about their location.   I don't use NPM but was just reading
about how they converted their package namespace from a single global
namespace with I think it was 30k packages in it,
to @organization/packagename.  I don't know how folks would feel about a
central namespace registry, I don't love the idea if we can find a way
around it, but would settle for it if there's no better solution.  Either
that or use a UUID as the schema name.  Truly hideous.  But it seems like
your approach above with just dynamically looking up the extension's schema
as a variable would solve everything.

There is the problem of sequencing, where extension A installs dependency
extension B in it's own schema.  Then extension C also wants to use
dependency B, but extension A is uninstalled and extension B is now still
hanging around in A's old schema.  Not ideal but at least everything would
still function.

I'll keep thinking about it...


> > #2:  Data in Extensions
> >
> > Extensions that are just a collection of functions and types seem to be
> the
> > norm.  Extensions can contain what the docs call "configuration" data,
> but
> > rows are really second class citizens:  They aren't tracked with
> > pg_catalog.pg_depend, they aren't deleted when the extension is dropped,
> > etc.
> >
> > Sometimes it would make sense for an extension to contain *only* data, or
> > insert some rows in a table that the extension doesn't "own", but has as
> a
> > dependency.  For example, a "webserver" extension might contain a
> > "resource" table that serves up the content of resources in the table at
> a
> > specified path. But then, another extension, say an app, might want to
> just
> > list the webserver extension as a dependency, and insert a few resource
> > rows into it.  This is really from what I can tell beyond the scope of
> what
> > extensions are capable of.
>
> I never thought of this use case.  Interesting.
>

It's a *really* powerful pattern.  I am sure of this because I've been
exploring it while developing a row packaging system modeled after git [1],
and using it in conjunction with EXTENSIONs with extreme joy.  But one does
rows, and the other does DDL, and this is not ideal.

Cheers,
Eric

[1]
https://github.com/aquametalabs/aquameta/tree/master/src/pg-extension/bundle


Re: extensions are hitting the ceiling

2019-04-15 Thread Noah Misch
On Mon, Mar 18, 2019 at 09:38:19PM -0500, Eric Hanson wrote:
> #1: Dependencies
> 
> Let's say we have two extensions, A and B, both of which depend on a third
> extension C, let's just say C is hstore.  A and B are written by different
> developers, and both contain in their .control file the line
> 
> requires = 'hstore'
> 
> When A is installed, if A creates a schema, it puts hstore in that schema.
> If not, hstore is already installed, it uses it in that location.  How does
> the extension know where to reference hstore?
> 
> Then, when B is installed, it checks to see if extension hstore is
> installed, sees that it is, and moves on.  What if it expects it in a
> different place than A does? The hstore extension can only be installed
> once, in a single schema, but if multiple extensions depend on it and look
> for it in different places, they are incompatible.
> 
> I have heard talk of a way to write extensions so that they dynamically
> reference the schema of their dependencies, but sure don't know how that
> would work if it's possible.  The @extschema@ variable references the
> *current* extension's schema, but not there is no dynamic variable to
> reference the schema of a dependency.

If desperate, you can do it like this:

  DO $$ BEGIN EXECUTE format('SELECT %I.earth()',
(SELECT nspname FROM pg_namespace n
 JOIN pg_extension ON n.oid = extnamespace
 WHERE extname = 'earthdistance' )); END $$;

Needless to say, that's too ugly.  Though probably unimportant in practice, it
also has a race condition vs. ALTER EXTENSION SET SCHEMA.

> Also it is possible in theory to dynamically set search_path to contain
> every schema of every dependency in play and then just not specify a schema
> when you use something in a dependency.  But this ANDs together all the
> scopes of all the dependencies of an extension, introducing potential for
> collisions, and is generally kind of clunky.

That's how it works today, and it has the problems you describe.  I discussed
some solution candidates here:
https://www.postgresql.org/message-id/20180710014308.ga805...@rfd.leadboat.com

The @DEPNAME_schema@ thing was trivial to implement, but I shelved it.  I'm
attaching the proof of concept, for your information.

> #2:  Data in Extensions
> 
> Extensions that are just a collection of functions and types seem to be the
> norm.  Extensions can contain what the docs call "configuration" data, but
> rows are really second class citizens:  They aren't tracked with
> pg_catalog.pg_depend, they aren't deleted when the extension is dropped,
> etc.
> 
> Sometimes it would make sense for an extension to contain *only* data, or
> insert some rows in a table that the extension doesn't "own", but has as a
> dependency.  For example, a "webserver" extension might contain a
> "resource" table that serves up the content of resources in the table at a
> specified path. But then, another extension, say an app, might want to just
> list the webserver extension as a dependency, and insert a few resource
> rows into it.  This is really from what I can tell beyond the scope of what
> extensions are capable of.

I never thought of this use case.  Interesting.
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 2e45381..cd061ea 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -786,13 +786,14 @@ static void
 execute_extension_script(Oid extensionOid, ExtensionControlFile *control,
 const char *from_version,
 const char *version,
+List *requiredExtensions,
 List *requiredSchemas,
 const char *schemaName, Oid 
schemaOid)
 {
char   *filename;
int save_nestlevel;
StringInfoData pathbuf;
-   ListCell   *lc;
+   ListCell   *lc, *le;
 
/*
 * Enforce superuser-ness if appropriate.  We postpone this check until
@@ -907,6 +908,19 @@ execute_extension_script(Oid extensionOid, 
ExtensionControlFile *control,
}
 
/*
+* For each dependency, substitute the dependency's schema for
+* @DEPNAME_schema@.  This is fishy for a relocatable 
dependency, but
+* we accept that risk.
+*/
+   forboth (le, requiredExtensions, lc, requiredSchemas)
+   {
+   t_sql = DirectFunctionCall3(replace_text,
+   
t_sql,
+   
CStringGetTextDatum(psprintf("@%s_schema@", 
get_extension_name(lfirst_oid(le,
+   

Re: extensions are hitting the ceiling

2019-03-19 Thread Eric Hanson
On Mon, Mar 18, 2019 at 11:56 PM Chapman Flack 
wrote:

> On 03/18/19 22:38, Eric Hanson wrote:
> > rows are really second class citizens:  They aren't tracked with
> > pg_catalog.pg_depend, they aren't deleted when the extension is dropped,
> > etc.
>
> This. You have other interests as well, but this is the one I was thinking
> about a few years ago in [1] (starting at "Ok, how numerous would be the
> problems with this:").
>

Cool!

First thoughts, it seems like a sensible way to go given the premise that
extensions are immutable.  But -- I'd be a little concerned about the
performance ramifications.  Usually there are not jillions of database
objects in a extension, but if they started containing data, there sure
could be jillions of rows.  Every row would have to be checked for
existence as part of an extension on every insert or update, no?

Nobody ever chimed in to say how numerous they did or didn't think the
> problems would be. I was actually thinking recently about sitting down
> and trying to write that patch, as no one had exactly stood up to say
> "oh heavens no, don't write that." But my round tuits are all deployed
> elsewhere at the moment.
>

Likewise, if nobody tells me "oh sheeze extensions can already do all this"
I'm going to assume they can't. :-)

I'd still like to discuss the ideas.


Me too!

Ok, I should probably come out and say it:  I think the user story of
"There is some kind of packaging system that can contain both schema and
data, and these packages can be installed and removed along with their
dependencies atomically" is fairly obvious and desirable.  But getting
there while accepting the premises that are currently baked into extensions
might be a tall order.

Extensions have a middleware-ish aspect to them -- they are immutable and
that immutability is checked and enforced at runtime.  That might scale
just fine to a few dozen database objects that only check pg_depends on DDL
operations, but if we introduce record tracking and start sticking sticks
into the wheels of the DML, things could go south really quickly it seems.

I really like a more git-like pattern, where you are free to modify the
working copy of a repository (or in this case an extension), and instead of
being blocked from changing things, the system tells the user what has
changed and how, and gives sensible options for what to do about it.  That
way it doesn't incur a performance hit, and the user can do a kind of "git
status" on their extension to show any changes.

How about an extension system whose first principle is that an extension is
made up of rows, period.  What about the DDL you ask?  Well...

Imagine a system catalog whose sole purpose is to contain database object
definitions like "CREATE VIEW ...", similar to those produced by
pg_catalog.pg_get_viewdef(), pg_catalog.get_functiondef(), etc.  Let's call
this catalog `def`. There is exactly one VIEW for every type of database
object in PostgreSQL. def.table, def.role, def.sequence, def.operator,
def.type, etc. Each def.* VIEW contains only two columns, `id` and
`definition`.  The `id` column contains a unique identifier for the object,
and the `definition` column contains the SQL statement that will recreate
the object.

So, inside this system catalog is the SQL definition statement of every
database object.  In theory, the contents of all the `definition` columns
together would be similar to the contents of pg_dump --schema-only.

Now, imagine all these def.* views had insert triggers, so that on insert,
it actually executes the contents of the `definition` column.  In theory,
we could pg_restore the data in the def.* views, and it would recreate all
the database objects. It could shift all that logic out of pg_dump and into
the database.

So using the def.* catalog, we could package both "regular" table data and
system objects via the contents of the def.* catalog views.  Packages are a
collection rows, period. Build up from there.

I'm working on a prototype called bundle [1], it still has a ways to go but
it's showing some promise.  It is going to require brining into PostgreSQL
the missing pg_get_*def functions, as folks have talked about before [2].

Thanks,
Eric

[1]
https://github.com/aquametalabs/aquameta/tree/master/src/pg-extension/bundle
[2]
https://www.postgresql.org/message-id/20130429234634.ga10...@tornado.leadboat.com


Re: extensions are hitting the ceiling

2019-03-19 Thread Jiří Fejfar
Hi all!

We are also facing some issues when using extensions. We are using
them quite intensively as a tool for maintaining our custom "DB
applications" with versioning, all tables, data, regression tests...
We find extensions great! We do not need other tool like flyway. My
colleague already posted some report to bug mailing list [1] but with
no response.

Our observations correspond well with your outline:

#1: Dependencies

* It is not possible to specify the version of extension we are
dependent on in .control file.

#2: Data in Extensions

I am not sure about the name "Configuration" Tables. From my point of
view extensions can hold two sorts of data:
1) "static" data: delivered with extension, inserted by update
scripts; the same "static" data are present across multiple
installation of extension in the same version. This data are not
supposed to be dumped.
2) "dynamic" data: inserted by users, have to be included in dumps,
are marked with pg_extension_config_dump and are called
"configuration" tables/data ... but why "configuration"?

#3 pg_dump and Extensions

* We have described some behavior of pg_dump, which we believe are in
fact bugs: see [1] "1) pg_dump with --schema parameter" and "2)
Hanging OID in extconfig".
* Maybe it would be good to introduce new switch pg_dump --extension
extA dumping all "dynamic" data from extension tables regardless on
schema

#4: Extension owned

* It is not possible to alter extension owner

Thanks, Jiří & Ivo.

[1] https://www.postgresql.org/message-id/15616-260dc9cb3bec7...@postgresql.org



Re: extensions are hitting the ceiling

2019-03-18 Thread Chapman Flack
On 03/19/19 00:56, Chapman Flack wrote:
> Nobody ever chimed in to say how numerous they did or didn't think the
> problems would be. I was actually thinking recently about sitting down
> and trying to write that patch, as no one had exactly stood up to say
> "oh heavens no, don't write that."

Of course, one notable thing that has happened since I wrote that design
was that Oids have stopped being magical, or supported in user tables.
So a bit of "mutatis mutandis" is needed when reading it in 2019.

Regards,
-Chap



Re: extensions are hitting the ceiling

2019-03-18 Thread Chapman Flack
On 03/18/19 22:38, Eric Hanson wrote:
> rows are really second class citizens:  They aren't tracked with
> pg_catalog.pg_depend, they aren't deleted when the extension is dropped,
> etc.

This. You have other interests as well, but this is the one I was thinking
about a few years ago in [1] (starting at "Ok, how numerous would be the
problems with this:").

Nobody ever chimed in to say how numerous they did or didn't think the
problems would be. I was actually thinking recently about sitting down
and trying to write that patch, as no one had exactly stood up to say
"oh heavens no, don't write that." But my round tuits are all deployed
elsewhere at the moment.

I'd still like to discuss the ideas.

-Chap

[1] https://www.postgresql.org/message-id/5685A2E7.6080209%40anastigmatix.net



extensions are hitting the ceiling

2019-03-18 Thread Eric Hanson
Hi folks,

After months and years of really trying to make EXTENSIONs meet the
requirements of my machinations, I have come to the conclusion that either
a) I am missing something or b) they are architecturally flawed.  Or
possibly both.

Admittedly, I might be trying to push extensions beyond what the great
elephant in the sky ever intended. The general bent here is to try to
achieve a level of modular reusable components similar to those in
"traditional" programming environments like pip, gem, npm, cpan, etc.
Personally, I am trying to migrate as much of my dev stack as possible away
from the filesystem and into the database. Files, especially code,
configuration, templates, permissions, manifests and other development
files, would be much happier in a database where they have constraints and
an information model and can be queried!

Regardless, it would be really great to be able to install an extension,
and have it cascade down to multiple other extensions, which in turn
cascade down to more, and have everything just work. Clearly, this was
considered in the extension architecture, but I'm running into some
problems making it a reality.  So here they are.


#1: Dependencies

Let's say we have two extensions, A and B, both of which depend on a third
extension C, let's just say C is hstore.  A and B are written by different
developers, and both contain in their .control file the line

requires = 'hstore'

When A is installed, if A creates a schema, it puts hstore in that schema.
If not, hstore is already installed, it uses it in that location.  How does
the extension know where to reference hstore?

Then, when B is installed, it checks to see if extension hstore is
installed, sees that it is, and moves on.  What if it expects it in a
different place than A does? The hstore extension can only be installed
once, in a single schema, but if multiple extensions depend on it and look
for it in different places, they are incompatible.

I have heard talk of a way to write extensions so that they dynamically
reference the schema of their dependencies, but sure don't know how that
would work if it's possible.  The @extschema@ variable references the
*current* extension's schema, but not there is no dynamic variable to
reference the schema of a dependency.

Also it is possible in theory to dynamically set search_path to contain
every schema of every dependency in play and then just not specify a schema
when you use something in a dependency.  But this ANDs together all the
scopes of all the dependencies of an extension, introducing potential for
collisions, and is generally kind of clunky.


#2:  Data in Extensions

Extensions that are just a collection of functions and types seem to be the
norm.  Extensions can contain what the docs call "configuration" data, but
rows are really second class citizens:  They aren't tracked with
pg_catalog.pg_depend, they aren't deleted when the extension is dropped,
etc.

Sometimes it would make sense for an extension to contain *only* data, or
insert some rows in a table that the extension doesn't "own", but has as a
dependency.  For example, a "webserver" extension might contain a
"resource" table that serves up the content of resources in the table at a
specified path. But then, another extension, say an app, might want to just
list the webserver extension as a dependency, and insert a few resource
rows into it.  This is really from what I can tell beyond the scope of what
extensions are capable of.


#3 pg_dump and Extensions

Tables created by extensions are skipped by pg_dump unless they are flagged
at create time with:

pg_catalog.pg_extension_config_dump('my_table', 'where id < 20')

However, there's no way that I can tell to mix and match rows and tables
across multiple extensions, so pg_dump can't keep track of multiple
extensions that contain rows in the same table.


I'd like an extension framework that can contain data as first class
citizens, and can gracefully handle a dependency chain and share
dependencies.  I have some ideas for a better approach, but they are pretty
radical.  I thought I would send this out and see what folks think.

Thanks,
Eric
--
http://aquameta.org/