On Sat, Jun 1, 2013 at 1:49 PM, Simon Riggs <si...@2ndquadrant.com> wrote: > On 30 May 2013 19:48, Josh Berkus <j...@agliodbs.com> wrote: > >> There's currently some great ideas bouncing around about eliminating the >> overhead associated with FREEZE. However, I wanted to take a step back >> and take a look at the big picture for VACUUM, FREEZE and ANALYZE. > > That is a very commendable approach. We should do that more often. > >> The big, big picture is this: >> >> 90% of our users need to think about VACUUM/ANALYZE >> at least 10% of the time >> and 10% of our users need to think about it >> almost 90% of the time. > > When you say stuff like that, you should add "speculating from my > personal experience". People might get the impression you'd measured > this somehow and it could confuse the issue if you try to assemble a > high level viewpoint and then add in factoids that are just opinions. > > We should strive to measure such things. > >> That's considerably better than was the case 5 years ago, when vacuum >> management was a daily or weekly responsibility for nearly 100% of our >> users, but it's still not good enough. Our target should be that only >> those with really unusual setups should have to *ever* think about >> vacuum and analyze. > > I think that's where we already are given that 1000s of users have > quite small databases. > > The problem increases with scale. Larger databases have bigger > problems and make it easier to notice things are happening. > > I think you should mention that the evidence for these issues is > anecdotal and take careful notes of the backgrounds in which they > occurred. Saying things occur in all cases wouldn't be accurate or > helpful to their resolution. > > We should be seeking to contrast this against other databases to see > if we are better or worse than other systems. For example, recording > the moans of someone who is currently managing a 1 TB database, but > yet hasn't ever managed anything else that big is less valuable than a > balanced, experienced viewpoint (if such exists). > > Anyway, I support this approach, just wanted to make sure we do it in > sufficient detail to be useful.
I agree with all that. I don't have any data either, but I agree that AFAICT it seems to mostly be a problem for large (terabyte-scale) databases, or ones that are dreadfully short of I/O bandwidth. AWS, I'm looking at you. It would be interesting to make a list of what other issues people have seen using PostgreSQL on very large data sets. Complaints I've heard include: 1. Inexplicable failure of the planner to use indexes on very large tables, preferring an obviously-stupid sequential scan. This might be fixed by the latest index-size fudge factor work. 2. Lack of concurrent DDL. On VACUUM and ANALYZE specifically, I'd have to say that the most common problems I encounter are (a) anti-wraparound vacuums kicking in at inconvenient times and eating up too many resources and (b) users making ridiculous settings changes to avoid the problems caused by anti-wraparound vacuums kicking in at inconvenient times and eating up too many resources. The changes we've been discussing elsewhere may not completely solve this problem, because we'll still have to read all pages that aren't yet all-visible... but they should surely help. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers