Re: Extensible storage manager API - smgr hooks

2022-08-25 Thread Andrey Borodin



> On 16 Jun 2022, at 13:41, Kirill Reshke  wrote:
> 
> Hello Yura and Anastasia.

FWIW this technology is now a part of Greenplum [0]. We are building GP 
extension that automatically offloads cold data to S3 - a very simplified 
version of Neon for analytical workloads.
When a segment of a table is not used for a long period of time, extension will 
sync files with backup storage in the Cloud.
When the user touches data, extension's smgr will bring table segments back 
from backup or latest synced version.

Our #1 goal is to provide a tool useful for the community. We easily can 
provide same extension for Postgres if this technology (extensible smgr) is in 
core. Does such an extension seem useful for Postgres? Or does this data access 
pattern seems unusual for Postgres? By pattern I mean vast amounts of cold data 
only ever appended and never touched.


Best regards, Andrey Borodin.

[0] https://github.com/greenplum-db/gpdb/pull/13601



Re: Extensible storage manager API - smgr hooks

2022-06-17 Thread Kirill Reshke
Hello Yura and Anastasia.

I have tried to implement per-relation SMGR approach, and faced with a
serious problem with redo.

So, to implement per-relation SMGR feature i have tried to do things
similar to custom table AM apporach: that is, we can define our custom SMGR
in an extention (which defines smgr handle) and then use this SMGR in
relation definition. like this:

```postgres=# create extension proxy_smgr ;
CREATE EXTENSION
postgres=# select * from pg_smgr ;
  oid  |  smgrname  |smgrhandler
---++
  4646 | md | smgr_md_handler
 16386 | proxy_smgr | proxy_smgr_handler
(2 rows)

postgres=# create table tt(i int) storage manager proxy_smgr_handler;
ERROR:  storage manager "proxy_smgr_handler" does not exist
postgres=# create table tt(i int) storage manager proxy_smgr;
INFO:  proxy open 1663 5 16391
INFO:  proxy create 16391
INFO:  proxy close, 16391
INFO:  proxy close, 16391
INFO:  proxy close, 16391
INFO:  proxy close, 16391
CREATE TABLE
postgres=# select * from tt;
INFO:  proxy open 1663 5 16391
INFO:  proxy nblocks 16391
INFO:  proxy nblocks 16391
 i
---
(0 rows)

postgres=# insert into tt values(1);
INFO:  proxy exists 16391
INFO:  proxy nblocks 16391
INFO:  proxy nblocks 16391
INFO:  proxcy extend 16391
INSERT 0 1
postgres=# select * from tt;
INFO:  proxy nblocks 16391
INFO:  proxy nblocks 16391
 i
---
 1
(1 row)
```

extention sql files looks like this:

```
CREATE FUNCTION proxy_smgr_handler(internal)
RETURNS table_smgr_handler
AS 'MODULE_PATHNAME'
LANGUAGE C;

-- Storage manager
CREATE STORAGE MANAGER proxy_smgr HANDLER proxy_smgr_handler;
```

To do this i have defined catalog relation pg_smgr where i store smgr`s
handlers and use this relation when we need to open some other(non-catalog)
relations in smgropen function. The patch almost passes regression tests(8
of 214 tests failed.) but it fails on first checkpoint or in crash
recorvery. Also, i have changed WAL format, added SMGR oid to each WAL
record with RelFileNode structure. Why do we need WAL changes? well, i
tried to solve folowing issue.

As i mentioned, there is a problem with redo, with is: we cannot do
syscache search to get relation`s SMGR to apply wal, because syscache is
not initialized during redo (crash recovery). As i understand, syscache is
not initialised because system catalogs are not consistent until crash
recovery is done.


So, thants it, I decided to write to this thread to get feedback and
understand how best to solve the problem with redo.

What do you think?

On Thu, Jun 16, 2022 at 1:38 PM Andres Freund  wrote:

> Hi,
>
> On 2021-06-30 05:36:11 +0300, Yura Sokolov wrote:
> > Anastasia Lubennikova писал 2021-06-30 00:49:
> > > Hi, hackers!
> > >
> > > Many recently discussed features can make use of an extensible storage
> > > manager API. Namely, storage level compression and encryption [1],
> > > [2], [3], disk quota feature [4], SLRU storage changes [5], and any
> > > other features that may want to substitute PostgreSQL storage layer
> > > with their implementation (i.e. lazy_restore [6]).
> > >
> > > Attached is a proposal to change smgr API to make it extensible.  The
> > > idea is to add a hook for plugins to get control in smgr and define
> > > custom storage managers. The patch replaces smgrsw[] array and smgr_sw
> > > selector with smgr() function that loads f_smgr implementation.
> > >
> > > As before it has only one implementation - smgr_md, which is wrapped
> > > into smgr_standard().
> > >
> > > To create custom implementation, a developer needs to implement smgr
> > > API functions
> > > static const struct f_smgr smgr_custom =
> > > {
> > > .smgr_init = custominit,
> > > ...
> > > }
> > >
> > > create a hook function
> > >
> > >const f_smgr * smgr_custom(BackendId backend, RelFileNode rnode)
> > >   {
> > >   //Here we can also add some logic and chose which smgr to use
> > > based on rnode and backend
> > >   return _custom;
> > >   }
> > >
> > > and finally set the hook:
> > > smgr_hook = smgr_custom;
> > >
> > > [1]
> > >
> https://www.postgresql.org/message-id/flat/11996861554042...@iva4-dd95b404a60b.qloud-c.yandex.net
> > > [2]
> > >
> https://www.postgresql.org/message-id/flat/272dd2d9.e52a.17235f2c050.Coremail.chjischj%40163.com
> > > [3] https://postgrespro.com/docs/enterprise/9.6/cfs
> > > [4]
> > >
> https://www.postgresql.org/message-id/flat/CAB0yre%3DRP_ho6Bq4cV23ELKxRcfhV2Yqrb1zHp0RfUPEWCnBRw%40mail.gmail.com
> > > [5]
> > >
> https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com
> > > [6]
> > > https://wiki.postgresql.org/wiki/PGCon_2021_Fun_With_WAL#Lazy_Restore
> > >
> > > --
> > >
> > > Best regards,
> > > Lubennikova Anastasia
> >
> > Good day, Anastasia.
> >
> > I also think smgr should be extended with different implementations
> aside of
> > md.
> > But which way concrete implementation will be chosen for particular
> > relation?
> > I believe it 

Re: Extensible storage manager API - smgr hooks

2022-03-21 Thread Andres Freund
Hi,

On 2021-06-30 05:36:11 +0300, Yura Sokolov wrote:
> Anastasia Lubennikova писал 2021-06-30 00:49:
> > Hi, hackers!
> > 
> > Many recently discussed features can make use of an extensible storage
> > manager API. Namely, storage level compression and encryption [1],
> > [2], [3], disk quota feature [4], SLRU storage changes [5], and any
> > other features that may want to substitute PostgreSQL storage layer
> > with their implementation (i.e. lazy_restore [6]).
> > 
> > Attached is a proposal to change smgr API to make it extensible.  The
> > idea is to add a hook for plugins to get control in smgr and define
> > custom storage managers. The patch replaces smgrsw[] array and smgr_sw
> > selector with smgr() function that loads f_smgr implementation.
> > 
> > As before it has only one implementation - smgr_md, which is wrapped
> > into smgr_standard().
> > 
> > To create custom implementation, a developer needs to implement smgr
> > API functions
> > static const struct f_smgr smgr_custom =
> > {
> > .smgr_init = custominit,
> > ...
> > }
> > 
> > create a hook function
> > 
> >const f_smgr * smgr_custom(BackendId backend, RelFileNode rnode)
> >   {
> >   //Here we can also add some logic and chose which smgr to use
> > based on rnode and backend
> >   return _custom;
> >   }
> > 
> > and finally set the hook:
> > smgr_hook = smgr_custom;
> > 
> > [1]
> > https://www.postgresql.org/message-id/flat/11996861554042...@iva4-dd95b404a60b.qloud-c.yandex.net
> > [2]
> > https://www.postgresql.org/message-id/flat/272dd2d9.e52a.17235f2c050.Coremail.chjischj%40163.com
> > [3] https://postgrespro.com/docs/enterprise/9.6/cfs
> > [4]
> > https://www.postgresql.org/message-id/flat/CAB0yre%3DRP_ho6Bq4cV23ELKxRcfhV2Yqrb1zHp0RfUPEWCnBRw%40mail.gmail.com
> > [5]
> > https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com
> > [6]
> > https://wiki.postgresql.org/wiki/PGCon_2021_Fun_With_WAL#Lazy_Restore
> > 
> > --
> > 
> > Best regards,
> > Lubennikova Anastasia
> 
> Good day, Anastasia.
> 
> I also think smgr should be extended with different implementations aside of
> md.
> But which way concrete implementation will be chosen for particular
> relation?
> I believe it should be (immutable!) property of tablespace, and should be
> passed
> to smgropen. Patch in current state doesn't show clear way to distinct
> different
> implementations per relation.
> 
> I don't think patch should be that invasive. smgrsw could pointer to
> array instead of static array as it is of now, and then reln->smgr_which
> will remain with same meaning. Yep it then will need a way to select
> specific
> implementation, but something like `char smgr_name[NAMEDATALEN]` field with
> linear search in (i believe) small smgrsw array should be enough.
> 
> Maybe I'm missing something?

There has been no activity on this thread for > 6 months. Therefore I'm
marking it as returned with feedback. Anastasia, if you want to work on this,
please do, but there's obviously no way it can be merged into 15...

Greetings,

Andres




Re: Extensible storage manager API - smgr hooks

2021-06-29 Thread Yura Sokolov

Anastasia Lubennikova писал 2021-06-30 00:49:

Hi, hackers!

Many recently discussed features can make use of an extensible storage
manager API. Namely, storage level compression and encryption [1],
[2], [3], disk quota feature [4], SLRU storage changes [5], and any
other features that may want to substitute PostgreSQL storage layer
with their implementation (i.e. lazy_restore [6]).

Attached is a proposal to change smgr API to make it extensible.  The
idea is to add a hook for plugins to get control in smgr and define
custom storage managers. The patch replaces smgrsw[] array and smgr_sw
selector with smgr() function that loads f_smgr implementation.

As before it has only one implementation - smgr_md, which is wrapped
into smgr_standard().

To create custom implementation, a developer needs to implement smgr
API functions
static const struct f_smgr smgr_custom =
{
.smgr_init = custominit,
...
}

create a hook function

   const f_smgr * smgr_custom(BackendId backend, RelFileNode rnode)
  {
  //Here we can also add some logic and chose which smgr to use
based on rnode and backend
  return _custom;
  }

and finally set the hook:
smgr_hook = smgr_custom;

[1]
https://www.postgresql.org/message-id/flat/11996861554042...@iva4-dd95b404a60b.qloud-c.yandex.net
[2]
https://www.postgresql.org/message-id/flat/272dd2d9.e52a.17235f2c050.Coremail.chjischj%40163.com
[3] https://postgrespro.com/docs/enterprise/9.6/cfs
[4]
https://www.postgresql.org/message-id/flat/CAB0yre%3DRP_ho6Bq4cV23ELKxRcfhV2Yqrb1zHp0RfUPEWCnBRw%40mail.gmail.com
[5]
https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com
[6]
https://wiki.postgresql.org/wiki/PGCon_2021_Fun_With_WAL#Lazy_Restore

--

Best regards,
Lubennikova Anastasia


Good day, Anastasia.

I also think smgr should be extended with different implementations 
aside of md.
But which way concrete implementation will be chosen for particular 
relation?
I believe it should be (immutable!) property of tablespace, and should 
be passed
to smgropen. Patch in current state doesn't show clear way to distinct 
different

implementations per relation.

I don't think patch should be that invasive. smgrsw could pointer to
array instead of static array as it is of now, and then reln->smgr_which
will remain with same meaning. Yep it then will need a way to select 
specific
implementation, but something like `char smgr_name[NAMEDATALEN]` field 
with

linear search in (i believe) small smgrsw array should be enough.

Maybe I'm missing something?

regards,
Sokolov Yura.From 90085398f5ecc90d6b7caa318bd3d5f2867ef95c Mon Sep 17 00:00:00 2001
From: anastasia 
Date: Tue, 29 Jun 2021 22:16:26 +0300
Subject: [PATCH] smgr_api.patch

Make smgr API pluggable. Add smgr_hook that can be used to define custom storage managers.
Remove smgrsw[] array and smgr_sw selector. Instead, smgropen() uses smgr() function to load
f_smgr implementation using smgr_hook.

Also add smgr_init_hook and smgr_shutdown_hook.
And a lot of mechanical changes in smgr.c functions.
---
 src/backend/storage/smgr/smgr.c | 136 ++--
 src/include/storage/smgr.h  |  56 -
 2 files changed, 116 insertions(+), 76 deletions(-)

diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 4dc24649df..5f1981a353 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -26,47 +26,8 @@
 #include "utils/hsearch.h"
 #include "utils/inval.h"
 
-
-/*
- * This struct of function pointers defines the API between smgr.c and
- * any individual storage manager module.  Note that smgr subfunctions are
- * generally expected to report problems via elog(ERROR).  An exception is
- * that smgr_unlink should use elog(WARNING), rather than erroring out,
- * because we normally unlink relations during post-commit/abort cleanup,
- * and so it's too late to raise an error.  Also, various conditions that
- * would normally be errors should be allowed during bootstrap and/or WAL
- * recovery --- see comments in md.c for details.
- */
-typedef struct f_smgr
-{
-	void		(*smgr_init) (void);	/* may be NULL */
-	void		(*smgr_shutdown) (void);	/* may be NULL */
-	void		(*smgr_open) (SMgrRelation reln);
-	void		(*smgr_close) (SMgrRelation reln, ForkNumber forknum);
-	void		(*smgr_create) (SMgrRelation reln, ForkNumber forknum,
-bool isRedo);
-	bool		(*smgr_exists) (SMgrRelation reln, ForkNumber forknum);
-	void		(*smgr_unlink) (RelFileNodeBackend rnode, ForkNumber forknum,
-bool isRedo);
-	void		(*smgr_extend) (SMgrRelation reln, ForkNumber forknum,
-BlockNumber blocknum, char *buffer, bool skipFsync);
-	bool		(*smgr_prefetch) (SMgrRelation reln, ForkNumber forknum,
-  BlockNumber blocknum);
-	void		(*smgr_read) (SMgrRelation reln, ForkNumber forknum,
-			  BlockNumber blocknum, char *buffer);
-	void		(*smgr_write) (SMgrRelation reln, ForkNumber 

Extensible storage manager API - smgr hooks

2021-06-29 Thread Anastasia Lubennikova
Hi, hackers!

Many recently discussed features can make use of an extensible storage
manager API. Namely, storage level compression and encryption [1], [2], [3],
disk quota feature [4], SLRU storage changes [5], and any other features
that may want to substitute PostgreSQL storage layer with their
implementation (i.e. lazy_restore [6]).

Attached is a proposal to change smgr API to make it extensible.  The idea
is to add a hook for plugins to get control in smgr and define custom
storage managers. The patch replaces smgrsw[] array and smgr_sw selector
with smgr() function that loads f_smgr implementation.

As before it has only one implementation - smgr_md, which is wrapped into
smgr_standard().

To create custom implementation, a developer needs to implement smgr API
functions
static const struct f_smgr smgr_custom =
{
.smgr_init = custominit,
...
}

create a hook function
   const f_smgr * smgr_custom(BackendId backend, RelFileNode rnode)
  {
  //Here we can also add some logic and chose which smgr to use based
on rnode and backend
  return _custom;
  }

and finally set the hook:
smgr_hook = smgr_custom;

[1]
https://www.postgresql.org/message-id/flat/11996861554042...@iva4-dd95b404a60b.qloud-c.yandex.net
[2]
https://www.postgresql.org/message-id/flat/272dd2d9.e52a.17235f2c050.Coremail.chjischj%40163.com
[3] https://postgrespro.com/docs/enterprise/9.6/cfs
[4]
https://www.postgresql.org/message-id/flat/CAB0yre%3DRP_ho6Bq4cV23ELKxRcfhV2Yqrb1zHp0RfUPEWCnBRw%40mail.gmail.com
[5]
https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com
[6] https://wiki.postgresql.org/wiki/PGCon_2021_Fun_With_WAL#Lazy_Restore


-- 
Best regards,
Lubennikova Anastasia
From 90085398f5ecc90d6b7caa318bd3d5f2867ef95c Mon Sep 17 00:00:00 2001
From: anastasia 
Date: Tue, 29 Jun 2021 22:16:26 +0300
Subject: [PATCH] smgr_api.patch

Make smgr API pluggable. Add smgr_hook that can be used to define custom storage managers.
Remove smgrsw[] array and smgr_sw selector. Instead, smgropen() uses smgr() function to load
f_smgr implementation using smgr_hook.

Also add smgr_init_hook and smgr_shutdown_hook.
And a lot of mechanical changes in smgr.c functions.
---
 src/backend/storage/smgr/smgr.c | 136 ++--
 src/include/storage/smgr.h  |  56 -
 2 files changed, 116 insertions(+), 76 deletions(-)

diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 4dc24649df..5f1981a353 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -26,47 +26,8 @@
 #include "utils/hsearch.h"
 #include "utils/inval.h"
 
-
-/*
- * This struct of function pointers defines the API between smgr.c and
- * any individual storage manager module.  Note that smgr subfunctions are
- * generally expected to report problems via elog(ERROR).  An exception is
- * that smgr_unlink should use elog(WARNING), rather than erroring out,
- * because we normally unlink relations during post-commit/abort cleanup,
- * and so it's too late to raise an error.  Also, various conditions that
- * would normally be errors should be allowed during bootstrap and/or WAL
- * recovery --- see comments in md.c for details.
- */
-typedef struct f_smgr
-{
-	void		(*smgr_init) (void);	/* may be NULL */
-	void		(*smgr_shutdown) (void);	/* may be NULL */
-	void		(*smgr_open) (SMgrRelation reln);
-	void		(*smgr_close) (SMgrRelation reln, ForkNumber forknum);
-	void		(*smgr_create) (SMgrRelation reln, ForkNumber forknum,
-bool isRedo);
-	bool		(*smgr_exists) (SMgrRelation reln, ForkNumber forknum);
-	void		(*smgr_unlink) (RelFileNodeBackend rnode, ForkNumber forknum,
-bool isRedo);
-	void		(*smgr_extend) (SMgrRelation reln, ForkNumber forknum,
-BlockNumber blocknum, char *buffer, bool skipFsync);
-	bool		(*smgr_prefetch) (SMgrRelation reln, ForkNumber forknum,
-  BlockNumber blocknum);
-	void		(*smgr_read) (SMgrRelation reln, ForkNumber forknum,
-			  BlockNumber blocknum, char *buffer);
-	void		(*smgr_write) (SMgrRelation reln, ForkNumber forknum,
-			   BlockNumber blocknum, char *buffer, bool skipFsync);
-	void		(*smgr_writeback) (SMgrRelation reln, ForkNumber forknum,
-   BlockNumber blocknum, BlockNumber nblocks);
-	BlockNumber (*smgr_nblocks) (SMgrRelation reln, ForkNumber forknum);
-	void		(*smgr_truncate) (SMgrRelation reln, ForkNumber forknum,
-  BlockNumber nblocks);
-	void		(*smgr_immedsync) (SMgrRelation reln, ForkNumber forknum);
-} f_smgr;
-
-static const f_smgr smgrsw[] = {
+static const f_smgr smgr_md = {
 	/* magnetic disk */
-	{
 		.smgr_init = mdinit,
 		.smgr_shutdown = NULL,
 		.smgr_open = mdopen,
@@ -82,11 +43,8 @@ static const f_smgr smgrsw[] = {
 		.smgr_nblocks = mdnblocks,
 		.smgr_truncate = mdtruncate,
 		.smgr_immedsync = mdimmedsync,
-	}
 };
 
-static const int NSmgr = lengthof(smgrsw);
-
 /*
  * Each backend has a