2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzh...@pivotal.io>: > Thanks Pavel. > Your patch did enforcement on storage level(md.c or we could also use > smgr_extend). It's straight forward. > But I prefer to implement disk_quota as a feature with following > objectives: > 1 set/alter disk quota setting on different database objects, e.g. user, > database, schema etc. not only a general GUC, but we could set separate > quota limit for a specific objects. > 2 enforcement operator should work at two positions: before query is > running and when query is running. The latter one's implementation maybe > similar to your patch. >
The patch was just example. The resource quotes should be more complex - per partition, table, schema, database, user - so GUC are possible, but not very user friendly. Our case is specific, but not too much. The servers are used for multidimensional analyses - and some tables can grow too fast (COPY, INSERT SELECT). We need to solve limits immediately. The implementation is simple, so I did it. Same implementation on database level, or schema level needs some more locks, so it will not be too effective. The resource management can be complex very complex, and I expect so it will be hard work. Regards Pavel > On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.steh...@gmail.com> > wrote: > >> Hi >> >> 2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzh...@pivotal.io>: >> >>> Thanks Chapman. >>> @Pavel, could you please explain more about your second suggestion >>> "implement >>> some quotas on storage level?" >>> >> >> See attached patch - it is very simple - and good enough for our >> purposes. >> >> Regards >> >> Pavel >> >> >> >>> We will not keep the long-lived processes attach to all databases(just >>> like you mentioned servers with thousands of databases) >>> And you are right, we could share ideas with autovacuum process, fork >>> worker processes in need. >>> "autovacuum checks for tables that have had a large number of inserted, >>> updated or deleted tuples. These checks use the statistics collection >>> facility" >>> diskquota process is similar to autovacuum at caring about insert, but >>> the difference is that it also care about vucuum full, truncate and drop. >>> While update and delete may not be interested since no file change happens. >>> So a separate diskquota process is preferred. >>> >>> So if we implemented disk quota as a full native feature, and in the >>> first initial version I prefer to implement the following features: >>> 1 Fork diskquota launcher process under Postmaster serverloop, which is >>> long-lived. >>> 2 Diskquota launcher process is responsible for creating diskquota >>> worker process for every database. >>> 3 DIskquota setting is stored in a separate catalog table for each >>> database. >>> 4 Initialization stage, Diskquota launcher process creates diskquota worker >>> process for all the databases(traverse like autovacuum). Worker process >>> calculates disk usage of db objects and their diskquota setting. If any >>> db object exceeds its quota limit, put them into the blacklist in the >>> shared memory, which will later be used by enforcement operator. Worker >>> process exits when works are done. >>> 5 Running stage, Diskquota launcher process creates diskquota worker >>> process for the database with a large number of insert, copy, truncate, >>> drop etc. or create disk quota statement. Worker process updates the file >>> size for db objects containing the result relation, and compare with the >>> diskquota setting. Again, if exceeds quota limit, put them into blacklist, >>> remove from blacklist vice versa. Worker process exits when works are >>> done and a GUC could control the frequency of worker process restart to a >>> specific database. As you know, this GUC also controls the delay when we do >>> enforcement. >>> 6 Enforcement. When postgres backend executes queries, check the >>> blacklist in shared memory to determine whether the query is allowed(before >>> execute) or need rollback(is executing)? >>> >>> If we implemented disk quota as an extension, we could just use >>> background worker to start diskquota launcher process and use >>> RegisterDynamicBackgroundWorker() to fork child diskquota worker >>> processes by the launcher process as suggested by @Chapman. >>> Diskquota setting could be stored in user table in a separate schema for >>> each database(Schema and table created by create extension statement) just >>> like what Heikki has done in pg_quota project. But in this case, we need to >>> create extension for each database before diskquota worker process can be >>> set up for that database. >>> >>> Any comments on the above design and which is preferred, native feature >>> or extension as the POC? >>> >>> >>> -- Hubert >>> >>> >>> >>> On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.steh...@gmail.com> >>> wrote: >>> >>>> >>>> >>>> 2018-08-30 16:22 GMT+02:00 Chapman Flack <c...@anastigmatix.net>: >>>> >>>>> On 08/30/2018 09:57 AM, Hubert Zhang wrote: >>>>> >>>>> > 2 Keep one worker process for each database. But using a >>>>> parent/global >>>>> > quota worker process to manage the lifecycle of database level worker >>>>> > processes. It could handle the newly created database(avoid restart >>>>> > database) and save resource when a database is not used. But this >>>>> needs to >>>>> > change worker process to be hierarchical. Postmaster becomes the >>>>> grandfather >>>>> > of database level worker processes in this case. >>>>> >>>>> I am using background workers this way in 9.5 at $work. >>>>> >>>>> In my case, one worker lives forever, wakes up on a set period, and >>>>> starts a short-lived worker for every database, waiting for each >>>>> one before starting the next. >>>>> >>>>> It was straightforward to implement. Looking back over the code, >>>>> I see the global worker assigns its own PID to worker.bgw_notify_pid >>>>> of each of its children, and also obtains a handle for each child >>>>> from RegisterDynamicBackgroundWorker(). >>>>> >>>>> I imagine the global quota worker would prefer to start workers >>>>> for every database and then just wait for notifications from any >>>>> of them, but that seems equally straightforward at first glance. >>>>> >>>> >>>> There are servers with thousands databases. Worker per database is not >>>> good idea. >>>> >>>> It should to share ideas, code with autovacuum process. >>>> >>>> Not sure, how to effective implementation based on bg workers can be. >>>> On servers with large set of databases, large set of tables it can identify >>>> too big table too late. >>>> >>>> Isn't better to implement some quotas on storage level? >>>> >>>> Regards >>>> >>>> Pavel >>>> >>>> >>>> >>>>> -Chap >>>>> >>>>> >>>> >>> >>> >>> -- >>> Thanks >>> >>> Hubert Zhang >>> >> >> > > > -- > Thanks > > Hubert Zhang >