Hi 2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzh...@pivotal.io>:
> Thanks Chapman. > @Pavel, could you please explain more about your second suggestion "implement > some quotas on storage level?" > See attached patch - it is very simple - and good enough for our purposes. Regards Pavel > We will not keep the long-lived processes attach to all databases(just > like you mentioned servers with thousands of databases) > And you are right, we could share ideas with autovacuum process, fork > worker processes in need. > "autovacuum checks for tables that have had a large number of inserted, > updated or deleted tuples. These checks use the statistics collection > facility" > diskquota process is similar to autovacuum at caring about insert, but the > difference is that it also care about vucuum full, truncate and drop. While > update and delete may not be interested since no file change happens. So a > separate diskquota process is preferred. > > So if we implemented disk quota as a full native feature, and in the > first initial version I prefer to implement the following features: > 1 Fork diskquota launcher process under Postmaster serverloop, which is > long-lived. > 2 Diskquota launcher process is responsible for creating diskquota worker > process for every database. > 3 DIskquota setting is stored in a separate catalog table for each > database. > 4 Initialization stage, Diskquota launcher process creates diskquota worker > process for all the databases(traverse like autovacuum). Worker process > calculates disk usage of db objects and their diskquota setting. If any > db object exceeds its quota limit, put them into the blacklist in the > shared memory, which will later be used by enforcement operator. Worker > process exits when works are done. > 5 Running stage, Diskquota launcher process creates diskquota worker > process for the database with a large number of insert, copy, truncate, > drop etc. or create disk quota statement. Worker process updates the file > size for db objects containing the result relation, and compare with the > diskquota setting. Again, if exceeds quota limit, put them into blacklist, > remove from blacklist vice versa. Worker process exits when works are > done and a GUC could control the frequency of worker process restart to a > specific database. As you know, this GUC also controls the delay when we do > enforcement. > 6 Enforcement. When postgres backend executes queries, check the blacklist > in shared memory to determine whether the query is allowed(before execute) > or need rollback(is executing)? > > If we implemented disk quota as an extension, we could just use background > worker to start diskquota launcher process and use > RegisterDynamicBackgroundWorker() to fork child diskquota worker > processes by the launcher process as suggested by @Chapman. > Diskquota setting could be stored in user table in a separate schema for > each database(Schema and table created by create extension statement) just > like what Heikki has done in pg_quota project. But in this case, we need to > create extension for each database before diskquota worker process can be > set up for that database. > > Any comments on the above design and which is preferred, native feature or > extension as the POC? > > > -- Hubert > > > > On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.steh...@gmail.com> > wrote: > >> >> >> 2018-08-30 16:22 GMT+02:00 Chapman Flack <c...@anastigmatix.net>: >> >>> On 08/30/2018 09:57 AM, Hubert Zhang wrote: >>> >>> > 2 Keep one worker process for each database. But using a parent/global >>> > quota worker process to manage the lifecycle of database level worker >>> > processes. It could handle the newly created database(avoid restart >>> > database) and save resource when a database is not used. But this >>> needs to >>> > change worker process to be hierarchical. Postmaster becomes the >>> grandfather >>> > of database level worker processes in this case. >>> >>> I am using background workers this way in 9.5 at $work. >>> >>> In my case, one worker lives forever, wakes up on a set period, and >>> starts a short-lived worker for every database, waiting for each >>> one before starting the next. >>> >>> It was straightforward to implement. Looking back over the code, >>> I see the global worker assigns its own PID to worker.bgw_notify_pid >>> of each of its children, and also obtains a handle for each child >>> from RegisterDynamicBackgroundWorker(). >>> >>> I imagine the global quota worker would prefer to start workers >>> for every database and then just wait for notifications from any >>> of them, but that seems equally straightforward at first glance. >>> >> >> There are servers with thousands databases. Worker per database is not >> good idea. >> >> It should to share ideas, code with autovacuum process. >> >> Not sure, how to effective implementation based on bg workers can be. On >> servers with large set of databases, large set of tables it can identify >> too big table too late. >> >> Isn't better to implement some quotas on storage level? >> >> Regards >> >> Pavel >> >> >> >>> -Chap >>> >>> >> > > > -- > Thanks > > Hubert Zhang >
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index bb96881cad..619163e237 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -35,6 +35,7 @@ #include "storage/bufmgr.h" #include "storage/relfilenode.h" #include "storage/smgr.h" +#include "utils/guc.h" #include "utils/hsearch.h" #include "utils/memutils.h" #include "pg_trace.h" @@ -198,6 +199,11 @@ static MdfdVec *_mdfd_getseg(SMgrRelation reln, ForkNumber forkno, static BlockNumber _mdnblocks(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg); +/* + * limits for relations size + */ +int max_blocks; +int warn_blocks; /* * mdinit() -- Initialize private state for magnetic disk storage manager. @@ -504,6 +510,14 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, Assert(blocknum >= mdnblocks(reln, forknum)); #endif + if (max_blocks != -1 && blocknum > (BlockNumber) max_blocks) + ereport(ERROR, + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("cannot extend file beyond %u blocks", + max_blocks), + errhint("Session file limit defined by \"hard_relation_limit\" (%s) is over.", + GetConfigOptionByName("hard_relation_limit", NULL, false)))); + /* * If a relation manages to grow to 2^32-1 blocks, refuse to extend it any * more --- we mustn't create a block whose number actually is diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 9b533d52c8..ecc7906af2 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -161,6 +161,12 @@ static int syslog_facility = LOG_LOCAL0; static int syslog_facility = 0; #endif +/* + * limits for relations size + */ +extern int max_blocks; +extern int warn_blocks; + static void assign_syslog_facility(int newval, void *extra); static void assign_syslog_ident(const char *newval, void *extra); static void assign_session_replication_role(int newval, void *extra); @@ -2910,6 +2916,17 @@ static struct config_int ConfigureNamesInt[] = NULL, NULL, NULL }, + { + {"hard_relation_limit", PGC_USERSET, RESOURCES, + gettext_noop("Sets maximum number of blocks in relation."), + gettext_noop("The default is -1 (turning this feature off)."), + GUC_UNIT_BLOCKS | GUC_NOT_IN_SAMPLE + }, + &max_blocks, + -1, -1, INT_MAX, + NULL, NULL, NULL + }, + { {"track_activity_query_size", PGC_POSTMASTER, RESOURCES_MEM, gettext_noop("Sets the size reserved for pg_stat_activity.query, in bytes."), diff --git a/src/test/regress/expected/guc.out b/src/test/regress/expected/guc.out index 43ac5f5f11..6a3e090c9d 100644 --- a/src/test/regress/expected/guc.out +++ b/src/test/regress/expected/guc.out @@ -751,6 +751,12 @@ select current_setting('nosuch.setting', true); nada (1 row) +-- set limits for tables +set hard_relation_limit = '1MB'; +create table test_limit_hard as select 1 a, 2 b, 3 c, 4 d from generate_series(1,100000); +ERROR: cannot extend file beyond 128 blocks +HINT: Session file limit defined by "hard_relation_limit" (1MB) is over. +set hard_relation_limit to default; -- Normally, CREATE FUNCTION should complain about invalid values in -- function SET options; but not if check_function_bodies is off, -- because that creates ordering hazards for pg_dump diff --git a/src/test/regress/sql/guc.sql b/src/test/regress/sql/guc.sql index 23e5029780..bee38c013d 100644 --- a/src/test/regress/sql/guc.sql +++ b/src/test/regress/sql/guc.sql @@ -271,6 +271,15 @@ select current_setting('nosuch.setting'); select current_setting('nosuch.setting', false); select current_setting('nosuch.setting', true); + +-- set limits for tables +set hard_relation_limit = '1MB'; + +create table test_limit_hard as select 1 a, 2 b, 3 c, 4 d from generate_series(1,100000); + +set hard_relation_limit to default; + + -- Normally, CREATE FUNCTION should complain about invalid values in -- function SET options; but not if check_function_bodies is off, -- because that creates ordering hazards for pg_dump