On Mon, Apr 6, 2020 at 1:32 PM Magnus Hagander <mag...@hagander.net> wrote: > Now, if we were just talking about compression, it would actually be > interesting to implement some sort of "postgres compression API" if > you will, that is implemented by a shared library. This library could > then be used from pg_basebackup or from anything else that needs > compression. And anybody who wants could then do a "<compression X> > for PostgreSQL" module, removing the need for us to carry such code > upstream.
I think it could be more general than a compression library. It could be a store-my-stuff-and-give-it-back-to-me library, which might do compression or encryption or cloud storage or any combination of the three, and probably other stuff too. Imagine that you first call an init function with a namespace that is basically a string provided by the user. Then you open a file either for read or for write (but not both). Then you read or write a series of chunks (depending on the file mode). Then you close the file. Then you can do the same with more files. Finally at the end you close the namespace. You don't really need to care where or how the functions you are calling store the data. You just need them to return proper error indicators if by chance they fail. As compared with my previous proposal, this would work much better for pg_basebackup -Fp, because you wouldn't launch a new bzip2 process for every file. You'd just bzopen(), which is presumably quite lightweight by comparison. The reasons I didn't propose it are: 1. Running bzip2 on every file in a plain-format backup seems a lot sillier than running it on every tar file in a tar-format backup. 2. I'm not confident that the command specified here actually needs to be anything very complicated (unlike archive_command). 3. The barrier to entry for a loadable module is a lot higher than for a shell command. 4. I think that all of our existing infrastructure for loadable modules is backend-only. Now all of these are up for discussion. I am sure we can make the loadable module stuff work in frontend code; it would just take some work. A C interface for extensibility is very significantly harder to use than a shell interface, but it's still way better than no interface. The idea that this shell command can be something simple is my current belief, but it may turn out to be wrong. And I'm sure somebody can propose a good reason to do something with every file in a plain-format backup rather than using tar format. All that being said, I still find it hard to believe that we will want to add dependencies for libraries that we'd need to do encryption or S3 cloud storage to PostgreSQL itself. So if we go with this more integrated approach we should consider the possibility that, when the dust settles, PostgreSQL will only have pg_basebackup --output-plugin=lz4 and Aurora will also have pg_basebackup --output-plugin=s3. From my point of view, that would be less than ideal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company