Re: Proposal for disk quota feature

Pavel Stehule Sun, 02 Sep 2018 05:49:47 -0700

Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzh...@pivotal.io>:


> Thanks Chapman.
> @Pavel,  could you please explain more about your second suggestion "implement
> some quotas on storage level?"
>

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel



> We will not keep the long-lived processes attach to all databases(just
> like you mentioned servers with thousands of databases)
> And you are right, we could share ideas with autovacuum process, fork
> worker processes in need.
> "autovacuum checks for tables that have had a large number of inserted,
> updated or deleted tuples. These checks use the statistics collection
> facility"
> diskquota process is similar to autovacuum at caring about insert, but the
> difference is that it also care about vucuum full, truncate and drop. While
> update and delete may not be interested since no file change happens. So a
> separate diskquota process is preferred.
>
> So if we implemented disk quota as a full native feature, and in the
> first initial version I prefer to implement the following features:
> 1 Fork diskquota launcher process under Postmaster serverloop, which is
> long-lived.
> 2 Diskquota launcher process is responsible for creating diskquota worker
> process for every database.
> 3 DIskquota setting is stored in a separate catalog table for each
> database.
> 4 Initialization stage, Diskquota launcher process creates diskquota worker
> process for all the databases(traverse like autovacuum). Worker process
> calculates disk usage of db objects and their diskquota setting. If any
> db object exceeds its quota limit, put them into the blacklist in the
> shared memory, which will later be used by enforcement operator. Worker
> process exits when works are done.
> 5 Running stage, Diskquota launcher process creates diskquota worker
> process for the database with a large number of insert, copy, truncate,
> drop etc. or create disk quota statement. Worker process updates the file
> size for db objects containing the result relation, and compare with the
> diskquota setting. Again, if exceeds quota limit, put them into blacklist,
> remove from blacklist vice versa. Worker process exits when works are
> done and a GUC could control the frequency of worker process restart to a
> specific database. As you know, this GUC also controls the delay when we do
> enforcement.
> 6 Enforcement. When postgres backend executes queries, check the blacklist
> in shared memory to determine whether the query is allowed(before execute)
> or need rollback(is executing)?
>
> If we implemented disk quota as an extension, we could just use background
> worker to start diskquota launcher process and use
> RegisterDynamicBackgroundWorker() to fork child diskquota worker
> processes by the launcher process as suggested by @Chapman.
> Diskquota setting could be stored in user table in a separate schema for
> each database(Schema and table created by create extension statement) just
> like what Heikki has done in pg_quota project. But in this case, we need to
> create extension for each database before diskquota worker process can be
> set up for that database.
>
> Any comments on the above design and which is preferred, native feature or
> extension as the POC?
>
>
> -- Hubert
>
>
>
> On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.steh...@gmail.com>
> wrote:
>
>>
>>
>> 2018-08-30 16:22 GMT+02:00 Chapman Flack <c...@anastigmatix.net>:
>>
>>> On 08/30/2018 09:57 AM, Hubert Zhang wrote:
>>>
>>> > 2 Keep one worker process for each database. But using a parent/global
>>> > quota worker process to manage the lifecycle of database level worker
>>> > processes. It could handle the newly created database(avoid restart
>>> > database) and save resource when a database is not used. But this
>>> needs to
>>> > change worker process to be hierarchical. Postmaster becomes the
>>> grandfather
>>> >  of database level worker processes in this case.
>>>
>>> I am using background workers this way in 9.5 at $work.
>>>
>>> In my case, one worker lives forever, wakes up on a set period, and
>>> starts a short-lived worker for every database, waiting for each
>>> one before starting the next.
>>>
>>> It was straightforward to implement. Looking back over the code,
>>> I see the global worker assigns its own PID to worker.bgw_notify_pid
>>> of each of its children, and also obtains a handle for each child
>>> from RegisterDynamicBackgroundWorker().
>>>
>>> I imagine the global quota worker would prefer to start workers
>>> for every database and then just wait for notifications from any
>>> of them, but that seems equally straightforward at first glance.
>>>
>>
>> There are servers with thousands databases. Worker per database is not
>> good idea.
>>
>> It should to share ideas, code with autovacuum process.
>>
>> Not sure, how to effective implementation based on bg workers can be. On
>> servers with large set of databases, large set of tables it can identify
>> too big table too late.
>>
>> Isn't better to implement some quotas on storage level?
>>
>> Regards
>>
>> Pavel
>>
>>
>>
>>> -Chap
>>>
>>>
>>
>
>
> --
> Thanks
>
> Hubert Zhang
>

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index bb96881cad..619163e237 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -35,6 +35,7 @@
 #include "storage/bufmgr.h"
 #include "storage/relfilenode.h"
 #include "storage/smgr.h"
+#include "utils/guc.h"
 #include "utils/hsearch.h"
 #include "utils/memutils.h"
 #include "pg_trace.h"
@@ -198,6 +199,11 @@ static MdfdVec *_mdfd_getseg(SMgrRelation reln, ForkNumber forkno,
 static BlockNumber _mdnblocks(SMgrRelation reln, ForkNumber forknum,
 		   MdfdVec *seg);
 
+/*
+ *  limits for relations size
+ */
+int max_blocks;
+int warn_blocks;
 
 /*
  *	mdinit() -- Initialize private state for magnetic disk storage manager.
@@ -504,6 +510,14 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	Assert(blocknum >= mdnblocks(reln, forknum));
 #endif
 
+	if (max_blocks != -1 && blocknum > (BlockNumber) max_blocks)
+		ereport(ERROR,
+				(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+				 errmsg("cannot extend file beyond %u blocks",
+						max_blocks),
+				 errhint("Session file limit defined by \"hard_relation_limit\" (%s) is over.",
+						GetConfigOptionByName("hard_relation_limit", NULL, false))));
+
 	/*
 	 * If a relation manages to grow to 2^32-1 blocks, refuse to extend it any
 	 * more --- we mustn't create a block whose number actually is
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 9b533d52c8..ecc7906af2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -161,6 +161,12 @@ static int	syslog_facility = LOG_LOCAL0;
 static int	syslog_facility = 0;
 #endif
 
+/*
+ *  limits for relations size
+ */
+extern int max_blocks;
+extern int warn_blocks;
+
 static void assign_syslog_facility(int newval, void *extra);
 static void assign_syslog_ident(const char *newval, void *extra);
 static void assign_session_replication_role(int newval, void *extra);
@@ -2910,6 +2916,17 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"hard_relation_limit", PGC_USERSET, RESOURCES,
+			gettext_noop("Sets maximum number of blocks in relation."),
+			gettext_noop("The default is -1 (turning this feature off)."),
+			GUC_UNIT_BLOCKS | GUC_NOT_IN_SAMPLE
+		},
+		&max_blocks,
+		-1, -1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"track_activity_query_size", PGC_POSTMASTER, RESOURCES_MEM,
 			gettext_noop("Sets the size reserved for pg_stat_activity.query, in bytes."),
diff --git a/src/test/regress/expected/guc.out b/src/test/regress/expected/guc.out
index 43ac5f5f11..6a3e090c9d 100644
--- a/src/test/regress/expected/guc.out
+++ b/src/test/regress/expected/guc.out
@@ -751,6 +751,12 @@ select current_setting('nosuch.setting', true);
  nada
 (1 row)
 
+-- set limits for tables
+set hard_relation_limit = '1MB';
+create table test_limit_hard as select 1 a, 2 b, 3 c, 4 d from generate_series(1,100000);
+ERROR:  cannot extend file beyond 128 blocks
+HINT:  Session file limit defined by "hard_relation_limit" (1MB) is over.
+set hard_relation_limit to default;
 -- Normally, CREATE FUNCTION should complain about invalid values in
 -- function SET options; but not if check_function_bodies is off,
 -- because that creates ordering hazards for pg_dump
diff --git a/src/test/regress/sql/guc.sql b/src/test/regress/sql/guc.sql
index 23e5029780..bee38c013d 100644
--- a/src/test/regress/sql/guc.sql
+++ b/src/test/regress/sql/guc.sql
@@ -271,6 +271,15 @@ select current_setting('nosuch.setting');
 select current_setting('nosuch.setting', false);
 select current_setting('nosuch.setting', true);
 
+
+-- set limits for tables
+set hard_relation_limit = '1MB';
+
+create table test_limit_hard as select 1 a, 2 b, 3 c, 4 d from generate_series(1,100000);
+
+set hard_relation_limit to default;
+
+
 -- Normally, CREATE FUNCTION should complain about invalid values in
 -- function SET options; but not if check_function_bodies is off,
 -- because that creates ordering hazards for pg_dump

Re: Proposal for disk quota feature

Reply via email to