Re: [PoC] configurable out of disk space elog level

2022-11-22 Thread Greg Stark
On Thu, 17 Nov 2022 at 14:56, Robert Haas  wrote:
>
> Having a switch for one particular kind of error (out of many that
> could possibly occur) that triggers one particular coping strategy
> (out of many that could possibly be used) seems far too specific a
> thing to add as a core feature. And even if we had something much more
> general, I'm not sure why that should go into the database rather than
> being implemented outside it. After all, nothing at all prevents the
> user from scanning the database logs for "out of space" errors and
> shutting down the database if any are found. While you're at it, you
> could make your monitoring script also check the free space on the
> relevant partition using statfs() and page somebody if the utilization
> goes above 95% or whatever threshold you like, which would probably
> avoid service outages much more effectively than $SUBJECT.

I have often thought we report a lot of errors to the user as
transaction errors that database admins are often going to feel they
would rather treat as system-wide errors. Often the error the user
sees seems like a very low level error with no context that they can't
do anythign about. This seems to be an example of that.

I don't really have a good solution for it but I do think most users
would rather deal with these errors at a higher level than individual
queries from individual clients. Out of disk space, hardware errors,
out of memory, etc they would rather handle in one centralized place
as a global condition. You can work around that with
middleware/libraries/monitoring but it's kind of working around the
database giving you the information at the wrong time and place for
your needs.




Re: [PoC] configurable out of disk space elog level

2022-11-18 Thread Maxim Orlov
> I don't think this is a good feature to add to PostgreSQL. First, it's
> unclear why stopping the cluster is a desirable behavior. It doesn't
> stop the user transactions from failing; it just makes them fail in
> some other way. Now it is of course perfectly legitimate for a
> particular user to want that particular behavior anyway, but there are
> a bunch of other things that a user could equally legitimately want to
> do, like page the DBA or trigger a failover or kill off sessions that
> are using large temporary relations or whatever. And, equally, there
> are many other operating system errors to which a user could want the
> database system to respond in similar ways. For example, someone might
> want any given one of those treatments when an I/O error occurs
> writing to the data directory, or a read-only filesystem error, or a
> permission denied error.
>
> Having a switch for one particular kind of error (out of many that
> could possibly occur) that triggers one particular coping strategy
> (out of many that could possibly be used) seems far too specific a
> thing to add as a core feature. And even if we had something much more
> general, I'm not sure why that should go into the database rather than
> being implemented outside it. After all, nothing at all prevents the
> user from scanning the database logs for "out of space" errors and
> shutting down the database if any are found.


Yes, you are right. Actually, this is one of possible ways to deal with
described situation I
mentioned above. And if I would deal with such a task, I would make it via
log monitoring.
The question was: "could we be more general here?". Probably, not.


> While you're at it, you
> could make your monitoring script also check the free space on the
> relevant partition using statfs() and page somebody if the utilization
> goes above 95% or whatever threshold you like, which would probably
> avoid service outages much more effectively than $SUBJECT.
>
> I just can't see much real benefit in putting this logic inside the
> database.
>

OK, I got it. Thanks for your thoughts!

-- 
Best regards,
Maxim Orlov.


Re: [PoC] configurable out of disk space elog level

2022-11-17 Thread Robert Haas
On Wed, Nov 16, 2022 at 7:59 AM Maxim Orlov  wrote:
> The customer, mentioned above, in this particular case, would be glad to be 
> able to have a mechanism to stop the cluster.
> Again, in this concrete case.
>
> My proposal is to add a tablespace option in order to be able to configure 
> which behaviour is appropriate for a
> particular user. I've decided to call this option “on_no_space” for now. If 
> anyone has a better naming for this feature,
> please, report.

I don't think this is a good feature to add to PostgreSQL. First, it's
unclear why stopping the cluster is a desirable behavior. It doesn't
stop the user transactions from failing; it just makes them fail in
some other way. Now it is of course perfectly legitimate for a
particular user to want that particular behavior anyway, but there are
a bunch of other things that a user could equally legitimately want to
do, like page the DBA or trigger a failover or kill off sessions that
are using large temporary relations or whatever. And, equally, there
are many other operating system errors to which a user could want the
database system to respond in similar ways. For example, someone might
want any given one of those treatments when an I/O error occurs
writing to the data directory, or a read-only filesystem error, or a
permission denied error.

Having a switch for one particular kind of error (out of many that
could possibly occur) that triggers one particular coping strategy
(out of many that could possibly be used) seems far too specific a
thing to add as a core feature. And even if we had something much more
general, I'm not sure why that should go into the database rather than
being implemented outside it. After all, nothing at all prevents the
user from scanning the database logs for "out of space" errors and
shutting down the database if any are found. While you're at it, you
could make your monitoring script also check the free space on the
relevant partition using statfs() and page somebody if the utilization
goes above 95% or whatever threshold you like, which would probably
avoid service outages much more effectively than $SUBJECT.

I just can't see much real benefit in putting this logic inside the database.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PoC] configurable out of disk space elog level

2022-11-17 Thread Andres Freund
On 2022-11-17 14:40:02 +0300, Maxim Orlov wrote:
> On Wed, 16 Nov 2022 at 20:41, Andres Freund  wrote:
> > that aside, we can't do catalog accesses in all kinds of environments that
> > this currently is active in - most importantly it's affecting the startup
> > process. We don't do catalog accesses in the startup process, and even if
> > we
> > were to do so, we couldn't unconditionally because the catalog might not
> > even
> > be consistent at this point (nor is it guaranteed that the wal_level even
> > allows to access catalogs during recovery).
> >
> Yep, that is why I do use in get_tablespace_elevel:
> +   /*
> +* Use GUC level only in normal mode.
> +*/
> +   if (!IsNormalProcessingMode())
> +   return ERROR;
> 
> Anyway, I appreciate the opinion, thank you!

The startup process is in normal processing mode.




Re: [PoC] configurable out of disk space elog level

2022-11-17 Thread Maxim Orlov
On Wed, 16 Nov 2022 at 20:41, Andres Freund  wrote:

> Hi,
> You can't do catalog access below the bufmgr.c layer. It could lead to all
> kinds of nastiness, including potentially recursing back to md.c. Even
> leaving
>
Yep, this is my biggest concern. It turns out, that the way to make such a
feature is to use just GUC for all tablespaces or
forward elevel "from above".


> that aside, we can't do catalog accesses in all kinds of environments that
> this currently is active in - most importantly it's affecting the startup
> process. We don't do catalog accesses in the startup process, and even if
> we
> were to do so, we couldn't unconditionally because the catalog might not
> even
> be consistent at this point (nor is it guaranteed that the wal_level even
> allows to access catalogs during recovery).
>
Yep, that is why I do use in get_tablespace_elevel:
+   /*
+* Use GUC level only in normal mode.
+*/
+   if (!IsNormalProcessingMode())
+   return ERROR;

Anyway, I appreciate the opinion, thank you!

-- 
Best regards,
Maxim Orlov.


Re: [PoC] configurable out of disk space elog level

2022-11-16 Thread Andres Freund
Hi,

On 2022-11-16 15:59:09 +0300, Maxim Orlov wrote:
> Patch is posted as PoC is attached.

> --- a/src/backend/storage/smgr/md.c
> +++ b/src/backend/storage/smgr/md.c
> @@ -40,6 +40,7 @@
>  #include "storage/sync.h"
>  #include "utils/hsearch.h"
>  #include "utils/memutils.h"
> +#include "utils/spccache.h"
>  
>  /*
>   *   The magnetic disk storage manager keeps track of open file
> @@ -479,14 +480,16 @@ mdextend(SMgrRelation reln, ForkNumber forknum, 
> BlockNumber blocknum,
>  
>   if ((nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ, seekpos, 
> WAIT_EVENT_DATA_FILE_EXTEND)) != BLCKSZ)
>   {
> + int elevel = 
> get_tablespace_elevel(reln->smgr_rlocator.locator.spcOid);
> +

You can't do catalog access below the bufmgr.c layer. It could lead to all
kinds of nastiness, including potentially recursing back to md.c. Even leaving
that aside, we can't do catalog accesses in all kinds of environments that
this currently is active in - most importantly it's affecting the startup
process. We don't do catalog accesses in the startup process, and even if we
were to do so, we couldn't unconditionally because the catalog might not even
be consistent at this point (nor is it guaranteed that the wal_level even
allows to access catalogs during recovery).

I'm not convinced by the usecase in the first place, but purely technically I
think it's not viable to make this a tablespace option.

Greetings,

Andres Freund




Re: [PoC] configurable out of disk space elog level

2022-11-16 Thread Pavel Borisov
Hi, Maxim!
> My proposal is to add a tablespace option in order to be able to configure 
> which behaviour is appropriate for a
> particular user. I've decided to call this option “on_no_space” for now. If 
> anyone has a better naming for this feature,
> please, report.
>
> So, the idea is to add both GUC and tablespace option “on_no_space”. The 
> tablespace option defines the behaviour of the
> cluster for a particular tablespace in “on_no_space” situation. The GUC 
> defines the default value of tablespace option.

I suppose there can be a kind of attack with this feature i.e.

- If someone already has his own tablespace he can do:
ALTER TABLESPACE my SET on_no_space=fatal; // This needs tablespace
ownership, not superuser permission.
- Then fill up his own db with garbage to fill his tablespace.
- Then all the db cluster will go fatal, even if the other users'
tablespaces are almost free.

If this can be avoided, I think the patch can be useful.

Regards,
Pavel Borisov.




[PoC] configurable out of disk space elog level

2022-11-16 Thread Maxim Orlov
Hi!


PROBLEM

Our customer stumble onto the next behaviour of the Postgres cluster: if
disk space is exhausted, Postgres continues to
work until WAL can be successfully written. Thus, upon disk space
exhaustion, clients will get an “ERROR: could not
extend file “base/X/X”: No space left on device” messages and
transactions will be aborted. But the cluster
continues to work for a quite some time.

This behaviour of the PostgreSQL, of course, is perfectly legit. Cluster
just translate OS error to the user and can do
nothing about it, expecting space may be available later.

On the other hand, users continues to send more data and having more and
more transactions to be aborted.

There are next possible ways to diagnose described situation:
 —external monitoring system;
 —log analysis;
 —create/drop table and analyse results.

Each one have advantages and disadvantages. I'm not going to dive deeper
here, if you don't mind.

The customer, mentioned above, in this particular case, would be glad to be
able to have a mechanism to stop the cluster.
Again, in this concrete case.


PROPOSAL

My proposal is to add a tablespace option in order to be able to configure
which behaviour is appropriate for a
particular user. I've decided to call this option “on_no_space” for now. If
anyone has a better naming for this feature,
please, report.

So, the idea is to add both GUC and tablespace option “on_no_space”. The
tablespace option defines the behaviour of the
cluster for a particular tablespace in “on_no_space” situation. The GUC
defines the default value of tablespace option.

Patch is posted as PoC is attached.

Here's what it looks like:
===
== Create 100Mb disk
$ dd if=/dev/zero of=/tmp/foo.img bs=100M count=1
$ mkfs.ext4 /tmp/foo.img
$ mkdir /tmp/foo
$ sudo mount -t ext4 -o loop /tmp/foo.img /tmp/foo
$ sudo chown -R orlov:orlov /tmp/foo
===
== Into psql
postgres=# CREATE TABLESPACE foo LOCATION '/tmp/foo' WITH
(on_no_space=fatal);
CREATE TABLESPACE
postgres=# \db+
   List of tablespaces
Name| Owner | Location | Access privileges |   Options   |
 Size   | Description
+---+--+---+-+-+-
 foo| orlov | /tmp/foo |   | {on_no_space=fatal} |
0 bytes |
...

postgres=# CREATE TABLE bar(qux int, quux text) WITH (autovacuum_enabled =
false) TABLESPACE foo;
CREATE TABLE
postgres=# INSERT INTO bar(qux, quux) SELECT id, md5(id::text) FROM
generate_series(1, 1000) AS id;
FATAL:  could not extend file "pg_tblspc/16384/PG_16_202211121/5/16385": No
space left on device
HINT:  Check free disk space.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
===


CAVEATS

Again, I've posted this patch as a PoC. This is not a complete realization
of described functionality. AFAICS, there are
next problems:
 - I have to put get_tablespace_elevel call in RelationGetBufferForTuple in
order to tablespace in cache; overwise,
   cache miss in get_tablespace triggers assertion failing in lock.c:887
(Assert("!IsRelationExtensionLockHeld")).
   This assertion was added by commit 15ef6ff4 (see [0] for details).
 - What error should be when mdextend called not to insert a tuple into a
heap (WAL applying, for example)?

Maybe, adding just GUC without ability to customize certain tablespaces to
define "out of disk space" behaviour is enough?
I would appreciate it if you give your opinions on a subject.

-- 
Best regards,
Maxim Orlov.

[0]
https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B%2BSs%3DGn1LA%40mail.gmail.com


v1-0001-Add-out-of-disk-space-elog-level.patch
Description: Binary data