pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzh...@pivotal.io> napsal:
> > > > > *Hi all,We redesign disk quota feature based on the comments from Pavel > Stehule and Chapman Flack. Here are the new design.OverviewBasically, disk > quota feature is used to support multi-tenancy environment, different level > of database objects could be set a quota limit to avoid over use of disk > space. A common case could be as follows: DBA could enable disk quota on a > specified database list. DBA could set disk quota limit for > tables/schemas/roles in these databases. Separate disk quota worker process > will monitor the disk usage for these objects and detect the objects which > exceed their quota limit. Queries loading data into these “out of disk > quota” tables/schemas/roles will be cancelled.We are currently working at > init implementation stage. We would like to propose our idea firstly and > get feedbacks from community to do quick iteration.SQL Syntax (How to use > disk quota)1 Specify the databases with disk quota enabled in GUC > “diskquota_databases” in postgresql.conf and restart the database.2 DBA > could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1 > ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with > (quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota = > ‘3MB’);* > just fast reaction - why QUOTA object? Isn't ALTER SET enough? Some like ALTER TABLE a1 SET quote = 1MB; ALTER USER ... ALTER SCHEMA .. New DDL commans looks like too hard hammer . > > > > > > > > > > > > > > > > > > *3 Simulate a schema out of quota limit case: suppose table a1 and table > a2 are both under schema s1.INSERT INTO a1 SELECT > generate_series(1,1000);INSERT INTO a2 SELECT > generate_series(1,3000000);SELECT pg_sleep(5)INSERT INTO a1 SELECT > generate_series(1,1000);ERROR: schema's disk space quota exceededDROP > TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT > generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the > following components.1. Quota Setting Store is where the disk quota setting > to be stored and accessed. We plan to use catalog table pg_diskquota to > store these information. pg_diskquota is > like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /* > diskquota name */ int16 quotatype; /* diskquota type name */ Oid > quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /* > diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in > MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size > change of database objects. We plan to use stat collector to detect the > ‘active’ table list at initial stage. But stat collector has some > limitation on finding the active table which is in a running transaction. > Details see TODO section.3. Quota Size Checker is where to calculate the > size and compare with quota limit for database objects. According to > Pavel’s comment, autovacuum launcher and worker process could be a good > reference to disk quota. So we plan to use a disk quota launcher daemon > process and several disk quota worker process to finish this work. Launcher > process is responsible for starting worker process based on a user defined > database list from GUC. Worker process will connect to its target database > and monitor the disk usage for objects in this database. In init stage of > worker process, it will call calculate_total_relation_size() to calculate > the size for each user table. After init stage, worker process will refresh > the disk model every N seconds. Refreshing will only recalculate the size > of tables in ‘active’ table list, which is generated by Quata Change > Detector to minimize the cost.4. Quota Enforcement Operator is where to > check for the quota limitation at postgres backend side. We will firstly > implement it in ExecCheckRTPerms() as pre-running enforcement. It will > check the disk quota of tables being inserted or updated, and report error > if table’s or table’s schema’s or table’s owner’s quota limit is exceeded. > As a native feature, we plan to add more checkpoint to do running query > enforcement. For example, if a disk quota lefts 10MB quota, a query could > insert 1GB data. This query could be allowed in pre-running enforcement > check, but will be cancelled in running query enforcement check. Therefore, > it can improve the accurate of disk quota usage. To achieve this, we plan > to add a checkpoint in lower API such as smgr_extened. Hence, the Quota > Enforcement Operator will check the disk quota usage when smgr_extened is > called. If the quota is over limited, current query will be cancelled. > Highlight1. Native feature.Support native Create/Drop Disk Quota SQL > statement.New catalog table pg_diskquota to store disk quota setting.2. > Auto DML/DDL detection. Table > create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change, > Schema create/drop and Role create/drop will be detected by disk quota > automatically. 3. Low cost disk quota checker.Worker process of disk quota > need to refresh the disk usage model every N seconds. Since recalculate the > file size using stat() system call is expensive for a large number of > files, we use an ‘active’ table list to reduce the real work at each > iteration. A basic experiment on our init stage implementation on database > with 20K tables shows that the refresh cost is 1% cpu usage and will be > finished within 50ms. Todo/LimitationBefore we propose our patch, we plan > to enhance it with the following ideas:1. Setting database list with disk > quota enabled dynamically without restart database. Since we have the disk > quota launcher process, it could detect the new ‘diskquota_databases’ list > and start/stop the corresponding disk quota worker process.2. Enforcement > when query is running. Considering the case when there is 10MB quota left, > but next query will insert 10GB data. Current enforcement design will allow > this query to be executed. This is limited by the ‘active’ table detection > is generated by stat collector. Postgres backend will only send table stat > information to collector only when the transaction ends. We need a new way > to detect the ‘active’ table even when this table is being modified inside > a running transaction.3. Monitor unlimited number of databases. Current we > set the max number of disk quota worker process to be 10 to reduce the > affection normal workload. But how about if we want to monitor the disk > quota of more than 10 databases? Our solution is to let disk quota launcher > to manage a queue of database need to be monitored. And disk quota worker > process consuming the queue and refresh the disk usage/quota for this > database. After some periods, worker will return the database to the queue, > and fetch the top database from queue to process. The period determine the > delay of detecting disk quota change. To implement this feature, we need to > support a subprocess of postmaster to rebind to another database instead of > the database binded in InitPostgres().4. Support active table detection on > vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze > are not tracked by stat collector.Thanks to Heikki, Pavel Stehule,Chapman > Flack for the former comments on disk quota feature. Any comments on how to > improve disk quota feature are appreciated.* > > > On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.steh...@gmail.com> > wrote: > >> >> >> 2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzh...@pivotal.io>: >> >>> Thanks Pavel. >>> Your patch did enforcement on storage level(md.c or we could also use >>> smgr_extend). It's straight forward. >>> But I prefer to implement disk_quota as a feature with following >>> objectives: >>> 1 set/alter disk quota setting on different database objects, e.g. user, >>> database, schema etc. not only a general GUC, but we could set separate >>> quota limit for a specific objects. >>> 2 enforcement operator should work at two positions: before query is >>> running and when query is running. The latter one's implementation maybe >>> similar to your patch. >>> >> >> The patch was just example. The resource quotes should be more complex - >> per partition, table, schema, database, user - so GUC are possible, but not >> very user friendly. >> >> Our case is specific, but not too much. The servers are used for >> multidimensional analyses - and some tables can grow too fast (COPY, INSERT >> SELECT). We need to solve limits immediately. The implementation is simple, >> so I did it. Same implementation on database level, or schema level needs >> some more locks, so it will not be too effective. The resource management >> can be complex very complex, and I expect so it will be hard work. >> >> Regards >> >> Pavel >> >> >>> On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.steh...@gmail.com> >>> wrote: >>> >>>> Hi >>>> >>>> 2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzh...@pivotal.io>: >>>> >>>>> Thanks Chapman. >>>>> @Pavel, could you please explain more about your second suggestion >>>>> "implement >>>>> some quotas on storage level?" >>>>> >>>> >>>> See attached patch - it is very simple - and good enough for our >>>> purposes. >>>> >>>> Regards >>>> >>>> Pavel >>>> >>>> >>>> >>>>> We will not keep the long-lived processes attach to all databases(just >>>>> like you mentioned servers with thousands of databases) >>>>> And you are right, we could share ideas with autovacuum process, fork >>>>> worker processes in need. >>>>> "autovacuum checks for tables that have had a large number of >>>>> inserted, updated or deleted tuples. These checks use the statistics >>>>> collection facility" >>>>> diskquota process is similar to autovacuum at caring about insert, but >>>>> the difference is that it also care about vucuum full, truncate and drop. >>>>> While update and delete may not be interested since no file change >>>>> happens. >>>>> So a separate diskquota process is preferred. >>>>> >>>>> So if we implemented disk quota as a full native feature, and in the >>>>> first initial version I prefer to implement the following features: >>>>> 1 Fork diskquota launcher process under Postmaster serverloop, which >>>>> is long-lived. >>>>> 2 Diskquota launcher process is responsible for creating diskquota >>>>> worker process for every database. >>>>> 3 DIskquota setting is stored in a separate catalog table for each >>>>> database. >>>>> 4 Initialization stage, Diskquota launcher process creates diskquota >>>>> worker >>>>> process for all the databases(traverse like autovacuum). Worker process >>>>> calculates disk usage of db objects and their diskquota setting. If any >>>>> db object exceeds its quota limit, put them into the blacklist in the >>>>> shared memory, which will later be used by enforcement operator. Worker >>>>> process exits when works are done. >>>>> 5 Running stage, Diskquota launcher process creates diskquota worker >>>>> process for the database with a large number of insert, copy, truncate, >>>>> drop etc. or create disk quota statement. Worker process updates the file >>>>> size for db objects containing the result relation, and compare with the >>>>> diskquota setting. Again, if exceeds quota limit, put them into blacklist, >>>>> remove from blacklist vice versa. Worker process exits when works are >>>>> done and a GUC could control the frequency of worker process restart to a >>>>> specific database. As you know, this GUC also controls the delay when we >>>>> do >>>>> enforcement. >>>>> 6 Enforcement. When postgres backend executes queries, check the >>>>> blacklist in shared memory to determine whether the query is >>>>> allowed(before >>>>> execute) or need rollback(is executing)? >>>>> >>>>> If we implemented disk quota as an extension, we could just use >>>>> background worker to start diskquota launcher process and use >>>>> RegisterDynamicBackgroundWorker() to fork child diskquota worker >>>>> processes by the launcher process as suggested by @Chapman. >>>>> Diskquota setting could be stored in user table in a separate schema for >>>>> each database(Schema and table created by create extension statement) just >>>>> like what Heikki has done in pg_quota project. But in this case, we need >>>>> to >>>>> create extension for each database before diskquota worker process can be >>>>> set up for that database. >>>>> >>>>> Any comments on the above design and which is preferred, native >>>>> feature or extension as the POC? >>>>> >>>>> >>>>> -- Hubert >>>>> >>>>> >>>>> >>>>> On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule < >>>>> pavel.steh...@gmail.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> 2018-08-30 16:22 GMT+02:00 Chapman Flack <c...@anastigmatix.net>: >>>>>> >>>>>>> On 08/30/2018 09:57 AM, Hubert Zhang wrote: >>>>>>> >>>>>>> > 2 Keep one worker process for each database. But using a >>>>>>> parent/global >>>>>>> > quota worker process to manage the lifecycle of database level >>>>>>> worker >>>>>>> > processes. It could handle the newly created database(avoid restart >>>>>>> > database) and save resource when a database is not used. But this >>>>>>> needs to >>>>>>> > change worker process to be hierarchical. Postmaster becomes the >>>>>>> grandfather >>>>>>> > of database level worker processes in this case. >>>>>>> >>>>>>> I am using background workers this way in 9.5 at $work. >>>>>>> >>>>>>> In my case, one worker lives forever, wakes up on a set period, and >>>>>>> starts a short-lived worker for every database, waiting for each >>>>>>> one before starting the next. >>>>>>> >>>>>>> It was straightforward to implement. Looking back over the code, >>>>>>> I see the global worker assigns its own PID to worker.bgw_notify_pid >>>>>>> of each of its children, and also obtains a handle for each child >>>>>>> from RegisterDynamicBackgroundWorker(). >>>>>>> >>>>>>> I imagine the global quota worker would prefer to start workers >>>>>>> for every database and then just wait for notifications from any >>>>>>> of them, but that seems equally straightforward at first glance. >>>>>>> >>>>>> >>>>>> There are servers with thousands databases. Worker per database is >>>>>> not good idea. >>>>>> >>>>>> It should to share ideas, code with autovacuum process. >>>>>> >>>>>> Not sure, how to effective implementation based on bg workers can be. >>>>>> On servers with large set of databases, large set of tables it can >>>>>> identify >>>>>> too big table too late. >>>>>> >>>>>> Isn't better to implement some quotas on storage level? >>>>>> >>>>>> Regards >>>>>> >>>>>> Pavel >>>>>> >>>>>> >>>>>> >>>>>>> -Chap >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks >>>>> >>>>> Hubert Zhang >>>>> >>>> >>>> >>> >>> >>> -- >>> Thanks >>> >>> Hubert Zhang >>> >> >> > > > -- > Thanks > > Hubert Zhang >