On 12/23/2014 07:52 PM, Stephen Frost wrote:
[snip]
Manually performing VACUUM / VACUUM ANALYZE on the (few) affected
tables every 12h or so fixes the performance problem for the
particular queries without impacting the other users too much ---
the tables and indexes in question have been moved to a separate
tablespace/disk volume of their own.
Autovacuum can certainly run vacuum/analyze on a few tables every 12
hours, so I'm not really following where you see autovacuum being unable
to cope.  I agree that there *are* such cases, but getting more
information about those cases and exactly what solution *does* work
would really help us improve autovacuum to address those use-cases.

I'll try to. I don't have direct access, and the use case is quite edgy to be fair.
Plus, the configuration and hardware leaves quite a bit to be desired...

...but it's a real use case and the solution (even if only treating the symptoms) is quite straight-forward and easy.
In short, this addresses situations where some tables have a much
higher update rate than the rest of the database so that performance
degrades with time --- the application became unusable after about 6
days' worth of updates until the manual vacuums were setup
This really looks like a configuration issue with autovacuum..  Perhaps
you need to make it more aggressive than the default and have it run
more threads?

Yes to both. Up to something which actually affected performance a bit.
But basically only a few tables exhibited this behaviour among several hundreds in this particular situation.

Have you turned the autovacuum logging up all the way?
Is autovacuum giving up due to locking?
Not one of my systems, and I don't have access to it anymore, but I don't think this was the reason.

However, having some hundred million deleted rows piling every few hours quite increases the load. For the record, the (closed-source) application did issue the DELETEs on the table, so partitioning + TRUNCATE child_part was not applicable.


In any case, I was aiming at making this kind of operations possible and easier --- regardless of whether they are solving the right problem or not, or whether there exists an optimal solution --- since I have seen some real life solutions that could benefit from it. I agree that routine index maintenance is a better match for this feature, though :)
     REINDEX CONCURRENTLY  (but not the blocking, regular, one)
     REFRESH MATERIALIZED VIEW CONCURRENTLY (but not the blocking one)
These are interesting, but would these make sense at the role level?
Both of these commands explicitly take specific relations to operate
against, after all.
Yup. Let's imagine a cron job invoking psql in order to perform
maintenance routine.
If they make sense at a relation level then they should be
relation-level GRANT'd permissions, not role-level attributes.

Same as before.
Let's imagine this coupled with REINDEX SCHEMA CONCURRENTLY ... or simply when constructing the list of tables dynamically and there is no other use for such a grant. Arguably, this isn't that much of a problem if there exists a way to easily revoke all such permissions from all objects in one go (just like recently discussed in another thread)

The particular command(s) can be generated on-the-fly by querying
the catalog and then send them in one go to be run sequentially by
the one backend.... as a crude form of rate
limiting/quality-of-service of sorts ("renice -p" or even "ionice
-p" seems quite inadequate).
This sounds like it's something that we might want an autovacuum-like
background process to handle..  Some kind of auto-reindex-concurrently.
There are already plans to deal with updating of materialized views, as
I understand it.

While I can definitively see it for materialized views (they *are* views, after all), this pattern potentially gets us adding everything but the kitchen sink inside the database. FWIW, it's only a matter of providing a mechanism for maintenance routines to use very unprivileged users to perform their duties on the whole cluster without having to explicitly grant permissions and/or include these into another, regular, role. Please keep in mind that these roles [having only LOGIN and MAINTENANCE] would NOT be able to perform any DML or DDL whatsoever, nor any queries (unless explicitly granted permission for SELECTs).

[snip]
Yes. That's the reason for the question marks  :-\
Some "dump to csv then load somewhere else" kind of jobs might
benefit from this feature, but I'm not sure the convenience is worth
the risk.
I've run into quite a few processes which would really benefit from
this, and would even be safe to use (the processes running the COPY
commands don't have any rights on the directories except through PG),
but it's not clear if that use-case is sufficiently broad for the
feature to be worthwhile..  At least, some feel it isn't.  Can you
describe your use-case more and perhaps the needle will move on that
point?

I don't have any concrete example for this right now, but a "psql -f query_file.sql | processing_script | load_into_somewhere" scheduled to run nightly (every few hours, even) is definitively quite common. The COPY argument (which actually reads and outputs information) actually falsifies the above assumptions, I'm afraid. So this task would need explicit permissions anyway.

You seem to have better arguments in favor of this last feature, though.

[snip]
I suppose, but I think you're over-simplying your use-case.  Consider
this- do you want this login role to be a member of superuser roles?
What about DBA or other privilege-user roles?

As pointed out by David Johnston, impersonating superusers shall be explicitly forbidden. The "noimpersonate" attribute (actually, we might need a better name to avoid the implication that it means "can not impersonate others" but "can not be impersonated") could easily block this loophole.

This feature is definitely orthogonal to the proposed "magical audit role" proposal, which I heartily agree with. We might need to tweak logging a bit under this circumstances to reveal the login user vs the authorization user (quite like UID vs EUID for processess), though.



Since the changes to pg_authid seem to be about to be reverted, I'll wait a bit for a WIP patch --- this is definitively a lot more effort without the bitmask and helper funcs :( There existed another suggestion ---in the form of text-based "capabilities" IIRC--- which might be used for this purpose, however.

Thanks,

    J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to