Re: Proposal for disk quota feature

Pavel Stehule Fri, 21 Sep 2018 05:02:28 -0700

pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzh...@pivotal.io> napsal:


>
>
>
>
> *Hi all,We redesign disk quota feature based on the comments from Pavel
> Stehule and Chapman Flack. Here are the new design.OverviewBasically,  disk
> quota feature is used to support multi-tenancy environment, different level
> of database objects could be set a quota limit to avoid over use of disk
> space. A common case could be as follows: DBA could enable disk quota on a
> specified database list. DBA could set disk quota limit for
> tables/schemas/roles in these databases. Separate disk quota worker process
> will monitor the disk usage for these objects and detect the objects which
> exceed their quota limit. Queries loading data into these “out of disk
> quota” tables/schemas/roles will be cancelled.We are currently working at
> init implementation stage. We would like to propose our idea firstly and
> get feedbacks from community to do quick iteration.SQL Syntax (How to use
> disk quota)1 Specify the databases with disk quota enabled in GUC
> “diskquota_databases” in postgresql.conf and restart the database.2 DBA
> could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
> ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
> (quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
> ‘3MB’);*
>

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .



>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *3 Simulate a schema out of quota limit case: suppose table a1 and table
> a2 are both under schema s1.INSERT INTO a1 SELECT
> generate_series(1,1000);INSERT INTO a2 SELECT
> generate_series(1,3000000);SELECT pg_sleep(5)INSERT INTO a1 SELECT
> generate_series(1,1000);ERROR:  schema's disk space quota exceededDROP
> TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
> generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
> following components.1. Quota Setting Store is where the disk quota setting
> to be stored and accessed. We plan to use catalog table pg_diskquota to
> store these information. pg_diskquota is
> like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
> diskquota name */        int16 quotatype; /* diskquota type name */ Oid
> quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
> diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
> MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
> change of database objects. We plan to use stat collector to detect the
> ‘active’ table list at initial stage. But stat collector has some
> limitation on finding the active table which is in a running transaction.
> Details see TODO section.3. Quota Size Checker is where to calculate the
> size and compare with quota limit for database objects. According to
> Pavel’s comment, autovacuum launcher and worker process could be a good
> reference to disk quota. So we plan to use a disk quota launcher daemon
> process and several disk quota worker process to finish this work. Launcher
> process is responsible for starting worker process based on a user defined
> database list from GUC. Worker process will connect to its target database
> and monitor the disk usage for objects in this database. In init stage of
> worker process, it will call calculate_total_relation_size() to calculate
> the size for each user table. After init stage, worker process will refresh
> the disk model every N seconds. Refreshing will only recalculate the size
> of tables in ‘active’ table list, which is generated by Quata Change
> Detector to minimize the cost.4. Quota Enforcement Operator is where to
> check for the quota limitation at postgres backend side. We will firstly
> implement it in ExecCheckRTPerms() as pre-running enforcement. It will
> check the disk quota of tables being inserted or updated, and report error
> if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
> As a native feature, we plan to add more checkpoint to do running query
> enforcement. For example, if a disk quota lefts 10MB quota, a query could
> insert 1GB data. This query could be allowed in pre-running enforcement
> check, but will be cancelled in running query enforcement check. Therefore,
> it can improve the accurate of disk quota usage. To achieve this, we plan
> to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
> Enforcement Operator will check the disk quota usage when smgr_extened is
> called. If the quota is over limited, current query will be cancelled.
> Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
> statement.New catalog table pg_diskquota to store disk quota setting.2.
> Auto DML/DDL detection. Table
> create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
>  Schema create/drop and Role create/drop will be detected by disk quota
> automatically. 3. Low cost disk quota checker.Worker process of disk quota
> need to refresh the disk usage model every N seconds. Since recalculate the
> file size using stat() system call is expensive for a large number of
> files, we use an ‘active’ table list to reduce the real work at each
> iteration. A basic experiment on our init stage implementation on database
> with 20K tables shows that the refresh cost is 1% cpu usage and will be
> finished within 50ms. Todo/LimitationBefore we propose our patch, we plan
> to enhance it with the following ideas:1. Setting database list with disk
> quota enabled dynamically without restart database. Since we have the disk
> quota launcher process, it could detect the new ‘diskquota_databases’ list
> and start/stop the corresponding disk quota worker process.2. Enforcement
> when query is running. Considering the case when there is 10MB quota left,
> but next query will insert 10GB data. Current enforcement design will allow
> this query to be executed. This is limited by the ‘active’ table detection
> is generated by stat collector. Postgres backend will only send table stat
> information to collector only when the transaction ends. We need a new way
> to detect the ‘active’ table even when this table is being modified inside
> a running transaction.3. Monitor unlimited number of databases. Current we
> set the max number of disk quota worker process to be 10 to reduce the
> affection normal workload. But how about  if we want to monitor the disk
> quota of more than 10 databases? Our solution is to let disk quota launcher
> to manage a queue of database need to be monitored. And disk quota worker
> process consuming the queue and refresh the disk usage/quota for this
> database. After some periods, worker will return the database to the queue,
> and fetch the top database from queue to process. The period determine the
> delay of detecting disk quota change. To implement this feature, we need to
> support a subprocess of postmaster to rebind to another database instead of
> the database binded in InitPostgres().4. Support active table detection on
> vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze
> are not tracked by stat collector.Thanks to Heikki, Pavel Stehule,Chapman
> Flack for the former comments on disk quota feature. Any comments on how to
> improve disk quota feature are appreciated.*
>
>
> On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.steh...@gmail.com>
> wrote:
>
>>
>>
>> 2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzh...@pivotal.io>:
>>
>>> Thanks Pavel.
>>> Your patch did enforcement on storage level(md.c or we could also use
>>> smgr_extend). It's straight forward.
>>> But I prefer to implement disk_quota as a feature with following
>>> objectives:
>>> 1 set/alter disk quota setting on different database objects, e.g. user,
>>> database, schema etc. not only a general GUC, but we could set separate
>>> quota limit for a specific objects.
>>> 2 enforcement operator should work at two positions: before query is
>>> running and when query is running. The latter one's implementation maybe
>>> similar to your patch.
>>>
>>
>> The patch was just example. The resource quotes should be more complex -
>> per partition, table, schema, database, user - so GUC are possible, but not
>> very user friendly.
>>
>> Our case is specific, but not too much. The servers are used for
>> multidimensional analyses - and some tables can grow too fast (COPY, INSERT
>> SELECT). We need to solve limits immediately. The implementation is simple,
>> so I did it. Same implementation on database level, or schema level needs
>> some more locks, so it will not be too effective. The resource management
>> can be complex very complex, and I expect so it will be hard work.
>>
>> Regards
>>
>> Pavel
>>
>>
>>> On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.steh...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> 2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzh...@pivotal.io>:
>>>>
>>>>> Thanks Chapman.
>>>>> @Pavel,  could you please explain more about your second suggestion 
>>>>> "implement
>>>>> some quotas on storage level?"
>>>>>
>>>>
>>>> See attached patch - it is very simple - and good enough for our
>>>> purposes.
>>>>
>>>> Regards
>>>>
>>>> Pavel
>>>>
>>>>
>>>>
>>>>> We will not keep the long-lived processes attach to all databases(just
>>>>> like you mentioned servers with thousands of databases)
>>>>> And you are right, we could share ideas with autovacuum process, fork
>>>>> worker processes in need.
>>>>> "autovacuum checks for tables that have had a large number of
>>>>> inserted, updated or deleted tuples. These checks use the statistics
>>>>> collection facility"
>>>>> diskquota process is similar to autovacuum at caring about insert, but
>>>>> the difference is that it also care about vucuum full, truncate and drop.
>>>>> While update and delete may not be interested since no file change 
>>>>> happens.
>>>>> So a separate diskquota process is preferred.
>>>>>
>>>>> So if we implemented disk quota as a full native feature, and in the
>>>>> first initial version I prefer to implement the following features:
>>>>> 1 Fork diskquota launcher process under Postmaster serverloop, which
>>>>> is long-lived.
>>>>> 2 Diskquota launcher process is responsible for creating diskquota
>>>>> worker process for every database.
>>>>> 3 DIskquota setting is stored in a separate catalog table for each
>>>>> database.
>>>>> 4 Initialization stage, Diskquota launcher process creates diskquota 
>>>>> worker
>>>>> process for all the databases(traverse like autovacuum). Worker process
>>>>> calculates disk usage of db objects and their diskquota setting. If any
>>>>> db object exceeds its quota limit, put them into the blacklist in the
>>>>> shared memory, which will later be used by enforcement operator. Worker
>>>>> process exits when works are done.
>>>>> 5 Running stage, Diskquota launcher process creates diskquota worker
>>>>> process for the database with a large number of insert, copy, truncate,
>>>>> drop etc. or create disk quota statement. Worker process updates the file
>>>>> size for db objects containing the result relation, and compare with the
>>>>> diskquota setting. Again, if exceeds quota limit, put them into blacklist,
>>>>> remove from blacklist vice versa. Worker process exits when works are
>>>>> done and a GUC could control the frequency of worker process restart to a
>>>>> specific database. As you know, this GUC also controls the delay when we 
>>>>> do
>>>>> enforcement.
>>>>> 6 Enforcement. When postgres backend executes queries, check the
>>>>> blacklist in shared memory to determine whether the query is 
>>>>> allowed(before
>>>>> execute) or need rollback(is executing)?
>>>>>
>>>>> If we implemented disk quota as an extension, we could just use
>>>>> background worker to start diskquota launcher process and use
>>>>> RegisterDynamicBackgroundWorker() to fork child diskquota worker
>>>>> processes by the launcher process as suggested by @Chapman.
>>>>> Diskquota setting could be stored in user table in a separate schema for
>>>>> each database(Schema and table created by create extension statement) just
>>>>> like what Heikki has done in pg_quota project. But in this case, we need 
>>>>> to
>>>>> create extension for each database before diskquota worker process can be
>>>>> set up for that database.
>>>>>
>>>>> Any comments on the above design and which is preferred, native
>>>>> feature or extension as the POC?
>>>>>
>>>>>
>>>>> -- Hubert
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <
>>>>> pavel.steh...@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> 2018-08-30 16:22 GMT+02:00 Chapman Flack <c...@anastigmatix.net>:
>>>>>>
>>>>>>> On 08/30/2018 09:57 AM, Hubert Zhang wrote:
>>>>>>>
>>>>>>> > 2 Keep one worker process for each database. But using a
>>>>>>> parent/global
>>>>>>> > quota worker process to manage the lifecycle of database level
>>>>>>> worker
>>>>>>> > processes. It could handle the newly created database(avoid restart
>>>>>>> > database) and save resource when a database is not used. But this
>>>>>>> needs to
>>>>>>> > change worker process to be hierarchical. Postmaster becomes the
>>>>>>> grandfather
>>>>>>> >  of database level worker processes in this case.
>>>>>>>
>>>>>>> I am using background workers this way in 9.5 at $work.
>>>>>>>
>>>>>>> In my case, one worker lives forever, wakes up on a set period, and
>>>>>>> starts a short-lived worker for every database, waiting for each
>>>>>>> one before starting the next.
>>>>>>>
>>>>>>> It was straightforward to implement. Looking back over the code,
>>>>>>> I see the global worker assigns its own PID to worker.bgw_notify_pid
>>>>>>> of each of its children, and also obtains a handle for each child
>>>>>>> from RegisterDynamicBackgroundWorker().
>>>>>>>
>>>>>>> I imagine the global quota worker would prefer to start workers
>>>>>>> for every database and then just wait for notifications from any
>>>>>>> of them, but that seems equally straightforward at first glance.
>>>>>>>
>>>>>>
>>>>>> There are servers with thousands databases. Worker per database is
>>>>>> not good idea.
>>>>>>
>>>>>> It should to share ideas, code with autovacuum process.
>>>>>>
>>>>>> Not sure, how to effective implementation based on bg workers can be.
>>>>>> On servers with large set of databases, large set of tables it can 
>>>>>> identify
>>>>>> too big table too late.
>>>>>>
>>>>>> Isn't better to implement some quotas on storage level?
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Pavel
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -Chap
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks
>>>>>
>>>>> Hubert Zhang
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks
>>>
>>> Hubert Zhang
>>>
>>
>>
>
>
> --
> Thanks
>
> Hubert Zhang
>

Re: Proposal for disk quota feature

Reply via email to