Re: [PERFORM] duration logging setting in 7.4
On Thu, Nov 20, 2003 at 07:17:01PM -0500, Tom Lane wrote: Is it possible that you're sending a lot of queries that have an initial newline in the text? I'd expect the first line of log output for such a query to look as above. I don't think so, but it is possible, that queries have e.g. two semicolons on end - I've just noticed, that separating two queries with two or more semicolons gives one empty log entry for each redundand semicolon. We'll debug our application keeping this in mind. Richard. -- First they ignore you. Then they laugh at you. Then they fight you. Then you win. - Mohandas Gandhi. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
On Thu, 2003-11-20 at 19:40, Matthew T. O'Connor wrote: I'm open to discussion on changing the defaults. Perhaps what it would be better to use some non-linear (perhaps logorithmic) scaling factor. So that you wound up with something roughly like this: #tuples activity% for vacuum 1k 100% 10k 70% 100k 45% 1M20% 10M 10% 100M 8% Just thinking out loud here, so disregard if you think its chaff but... if we had a system table pg_avd_defaults that held what we generally consider the best default percentages based on reltuples/pages, and added a column to pg_class (could be some place better but..) which could hold an overriding percentage, you could then have a column added to pg_stat_all_tables called vacuum_percentage, which would be a coalesce of the override percentage or the default percentages based on rel_tuples (or rel_pages). This would give autovacuum a place to look for each table as to when it should vacuum, and gives administrators the option to tweak it on a per table basis if they find they need a specific table to vacuum at a different rate than the standard. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] duration logging setting in 7.4
Ryszard Lach [EMAIL PROTECTED] writes: It seems, that empty statements are generated during opening of connection. Hmm. Try asking about that on the pgsql-jdbc list. I think the JDBC driver must actually be sending empty commands. Looking at the backend code, I realize that 7.4 will emit LOG: entries for empty query strings, whereas prior releases would not. This isn't a bug IMHO, but it does explain why you are noticing output that wasn't there before you updated to 7.4. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Josh Berkus wrote: Matthew, True, but I think it would be one hour once, rather than 30 minutes 4 times. Well, generally it would be about 6-8 times at 2-4 minutes each. Are you saying that you can vacuum a 1 million row table in 2-4 minutes? While a vacuum of the same table with an additional 1 million dead tuples would take an hour? This is one of the things I had hoped to add to pg_autovacuum, but never got to. In addition to just the information from the stats collector on inserts updates and deletes, pg_autovacuum should also look at the FSM, and make decisions based on it. Anyone looking for a project? Hmmm ... I think that's the wrong approach. Once your database is populated, it's very easy to determine how to set the FSM for a given pg_avd level. If you're vacuuming after 20% updates, for example, just set fsm_pages to 20% of the total database pages plus growth safety margins. Ok. I'd be really reluctant to base pv-avd frequency on the fsm settings instead. What if the user loads 8GB of data but leaves fsm_pages at the default of 10,000? You can't do much with that; you'd have to vacuum if even 1% of the data changed. Ok, but as you said above it's very easy to set the FSM once you know your db size. The other problem is that calculating data pages from a count of updates+deletes would require pg_avd to keep more statistics and do more math for every table. Do we want to do this? I would think the math is simple enough to not be a big problem. Also, I did not recommend looking blindly at the FSM as our guide, rather consulting it as another source of information as to when it would be useful to vacuum. I don't have a good plan as to how to incorporate this data, but to a large extent the FSM already tracks table activity and gives us the most accurate answer about storage growth (short of using something like contrib/pgstattuple which takes nearly the same amount of time as an actual vacuum) But I can't imagine that 2% makes any difference on a large table. In fact I would think that 10-15% would hardly be noticable, beyond that I'm not sure. I've seen performance lag at 10% of records, especially in tables where both update and select activity focus on one subset of the table (calendar tables, for example). Ok. Valid points, and again I think this points to the fact that pg_autovacuum needs to be more configurable. Being able to set different thresholds for different tables will help considerably. In fact, you may find that some tables should have a vac threshold much larger than the analyze thresold, while other tables might want the opposite. Sure. Though I think we can make the present configuration work with a little adjustment of the numbers. I'll have a chance to test on production databases soon. I look forward to hearing results from your testing. I would be surprized if you can notice the difference between a vacuum analyze and a vacuum, especially on large tables. It's substantial for tables with high statistics settings. A 1,000,000 row table with 5 columns set to statistics=250 can take 3 minutes to analyze on a medium-grade server. In my testing, I never changed the default statistics settings. I think you need two separate schedules. There are lots of times where a vacuum doesn't help, and an analyze is all that is needed Agreed. And I've just talked to a client who may want to use pg_avd's ANALYZE scheduling but not use vacuum at all. BTW, I think we should have a setting for this; for example, if -V is -1, don't vacuum. That would be nice. Easy to add, and something I never thought of I'm open to discussion on changing the defaults. Perhaps what it would be better to use some non-linear (perhaps logorithmic) scaling factor. That would be cool, too.Though a count of data pages would be a better scale than a count of rows, and equally obtainable from pg_class. But we track tuples because we can compare against the count given by the stats system. I don't know of a way (other than looking at the FSM, or contrib/pgstattuple ) to see how many dead pages exist. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Matthew T. O'Connor wrote: But we track tuples because we can compare against the count given by the stats system. I don't know of a way (other than looking at the FSM, or contrib/pgstattuple ) to see how many dead pages exist. I think making pg_autovacuum dependent of pgstattuple is very good idea. Probably it might be a good idea to extend pgstattuple to return pages that are excessively contaminated and clean them ASAP. Step by step getting closer to daemonized vacuum. Shridhar ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Shridhar Daithankar wrote: Matthew T. O'Connor wrote: But we track tuples because we can compare against the count given by the stats system. I don't know of a way (other than looking at the FSM, or contrib/pgstattuple ) to see how many dead pages exist. I think making pg_autovacuum dependent of pgstattuple is very good idea. Only if pgstattuple can become much cheaper than it is now. Based on the testing I did when I wrote pg_autovacuum, pgstattuple cost nearly the same amount as a regular vacuum. Given that, what have we gained from that work? Wouldn't it just be better to run a vacuum and actually reclaim space rather than running pgstattuple, and just look and see if there is free space to be reclaimed? Perhaps we could use pgstattuple ocasionally to see if we are going a good job of keeping the amount of dead space to a reasonable level, but I'm still not really sure about this. Probably it might be a good idea to extend pgstattuple to return pages that are excessively contaminated and clean them ASAP. Step by step getting closer to daemonized vacuum. I don't know of anyway to clean a particular set of pages. This is something that has been talked about (partial vacuums and such), but I think Tom has raised several issues with it, I don't remember the details. Right now the only tool we have to reclaim space is vacuum, a whole table at a time. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Matthew, As long as pg_autovacuum remains a contrib module, I don't think any changes to the system catelogs will be make. If pg_autovacuum is deemed ready to move out of contrib, then we can talk about the above. But we could create a config file that would store stuff in a flatfile table, OR we could add our own system table that would be created when one initializes pg_avd. Just an idea. Mind you, I'm not so sure that we want to focus immediately on per-table settings. I think that we want to get the automatic settings working fairly well first; a lot of new DBAs would use the per-table settings to shoot themselves in the foot. So we need to be able to make a strong recommendation to try the automatic settings first. Are you saying that you can vacuum a 1 million row table in 2-4 minutes? While a vacuum of the same table with an additional 1 million dead tuples would take an hour? I'm probably exaggerating. I do know that I can vacuum a fairly clean 1-5 million row table in less than 4 mintues. I've never let such a table get to 50% dead tuples, so I don't really know how long that takes. Call me a coward if you like ... I'd be really reluctant to base pv-avd frequency on the fsm settings instead. What if the user loads 8GB of data but leaves fsm_pages at the default of 10,000? You can't do much with that; you'd have to vacuum if even 1% of the data changed. Ok, but as you said above it's very easy to set the FSM once you know your db size. Actually, thinking about this I realize that PG_AVD and the Perl-based postgresql.conf configuration script I was working on (darn, who was doing that with me?) need to go togther. With pg_avd, setting max_fsm_pages is very easy; without it its a bit of guesswork. So I think we can do this: for 'auto' settings: If max_fsm_pages is between 13% and 100% of the total database pages, then set the vacuum scale factor to match 3/4 of the fsm_pages setting, e.g. database = 18,000,000 data pages; max_fsm_pages = 3,600,000; set vacuum scale factor = 3.6mil/18mil * 3/4 = 0.15 If max_fsm_pages is less than 13% of database pages, issue a warning to the user (log it, if possible) and set scale factor to 0.1. If it's greater than 100% set it to 1 and leave it alone. I don't have a good plan as to how to incorporate this data, but to a large extent the FSM already tracks table activity and gives us the most accurate answer about storage growth (short of using something like contrib/pgstattuple which takes nearly the same amount of time as an actual vacuum) I don't really think we need to do dynamic monitoring at this point. It would be a lot of engineering to check data page pollution without having significant performance impact. It's doable, but something I think we should hold off until version 3. It would mean hacking the FSM, which is a little beyond me right now. In my testing, I never changed the default statistics settings. Ah. Well, a lot of users do to resolve query problems. But we track tuples because we can compare against the count given by the stats system. I don't know of a way (other than looking at the FSM, or contrib/pgstattuple ) to see how many dead pages exist. No, but for scaling you don't need the dynamic count of tuples or of dead tuples; pg_class holds a reasonable accurate count of pages per table as of last vacuum. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Josh Berkus wrote: Matthew, But we could create a config file that would store stuff in a flatfile table, OR we could add our own system table that would be created when one initializes pg_avd. I don't want to add tables to existing databases, as I consider that clutter and I never like using tools that clutter my production databases. I had considered using a pg_autovacuum database that if found, would store customized settings for individual tables / databases. Dunno if this is a good idea, but it might make a good stopgap until people are comfortable modifying the system catalogs for autovacuum. Actually, this might be a necessary addition as pg_autovacuum currently suffers from the startup transients that the FSM used to suffer from, that is, it doesn't remember anything that happened the last time it ran. A pg_autovacuum database could also be used to store thresholds and counts from the last time it ran. Just an idea. Mind you, I'm not so sure that we want to focus immediately on per-table settings. I think that we want to get the automatic settings working fairly well first; a lot of new DBAs would use the per-table settings to shoot themselves in the foot. So we need to be able to make a strong recommendation to try the automatic settings first. I agree in principle, question is what are the best settings, I still think it will be hard to find a one size fits all, but I'm sure we can do better than what we have. Actually, thinking about this I realize that PG_AVD and the Perl-based postgresql.conf configuration script I was working on (darn, who was doing that with me?) need to go togther. With pg_avd, setting max_fsm_pages is very easy; without it its a bit of guesswork. So I think we can do this: for 'auto' settings: If max_fsm_pages is between 13% and 100% of the total database pages, then set the vacuum scale factor to match 3/4 of the fsm_pages setting, e.g. database = 18,000,000 data pages; max_fsm_pages = 3,600,000; set vacuum scale factor = 3.6mil/18mil * 3/4 = 0.15 Where are you getting 13% from? Do you know of an easy way to get a count of the total pages used by a whole cluster? I guess we can just iterate over all the tables in all the databases and sum up the total num of pages. We already iterate over them all, we just don't sum it up. If max_fsm_pages is less than 13% of database pages, issue a warning to the user (log it, if possible) and set scale factor to 0.1. If it's greater than 100% set it to 1 and leave it alone. Again I ask where 13% is coming from and also where is 0.1 coming from? I assume these are your best guesses right now, but not more than that. I do like the concept though as long as we find good values for min_fsm_percentage and min_autovac_scaling_factor. But we track tuples because we can compare against the count given by the stats system. I don't know of a way (other than looking at the FSM, or contrib/pgstattuple ) to see how many dead pages exist. No, but for scaling you don't need the dynamic count of tuples or of dead tuples; pg_class holds a reasonable accurate count of pages per table as of last vacuum. Which we already keep a copy of inside of pg_autovacuum, and update after we issue a vacuum. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Matthew, Actually, this might be a necessary addition as pg_autovacuum currently suffers from the startup transients that the FSM used to suffer from, that is, it doesn't remember anything that happened the last time it ran. A pg_autovacuum database could also be used to store thresholds and counts from the last time it ran. I don't see how a seperate database is better than a table in the databases., except that it means scanning only one table and not one per database. For one thing, making it a seperate database could make it hard to back up and move your database+pg_avd config. But I don't feel strongly about it. Where are you getting 13% from? 13% * 3/4 ~~ 10% And I think both of use agree that vacuuming tables with less than 10% changes is excessive and could lead to problems on its own, like overlapping vacuums. Do you know of an easy way to get a count of the total pages used by a whole cluster? Select sum(relpages) from pg_class. I do like the concept though as long as we find good values for min_fsm_percentage and min_autovac_scaling_factor. See above. I propose 0.13 and 0.1 Which we already keep a copy of inside of pg_autovacuum, and update after we issue a vacuum. Even easier then. BTW, do we have any provisions to avoid overlapping vacuums? That is, to prevent a second vacuum on a table if an earlier one is still running? -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Josh Berkus wrote: Matthew, I don't see how a seperate database is better than a table in the databases., except that it means scanning only one table and not one per database. For one thing, making it a seperate database could make it hard to back up and move your database+pg_avd config. Basically, I don't like the idea of modifying users databases, besides, in the long run most of what needs to be tracked will be moved to the system catalogs. I kind of consider the pg_autvacuum database to equivalent to the changes that will need to be made to the system catalogs. I guess it could make it harder to backup if you are moving your database between clusters. Perhaps, if you create a pg_autovacuum schema inside of your database then we would could use that. I just don't like tools that drop things into your database. Where are you getting 13% from? 13% * 3/4 ~~ 10% And I think both of use agree that vacuuming tables with less than 10% changes is excessive and could lead to problems on its own, like overlapping vacuums. I certainly agree that less than 10% would be excessive, I still feel that 10% may not be high enough though. That's why I kinda liked the sliding scale I mentioned earlier, because I agree that for very large tables, something as low as 10% might be useful, but most tables in a database would not be that large. Do you know of an easy way to get a count of the total pages used by a whole cluster? Select sum(relpages) from pg_class. duh BTW, do we have any provisions to avoid overlapping vacuums? That is, to prevent a second vacuum on a table if an earlier one is still running? Only that pg_autovacuum isn't smart enough to kick off more than one vacuum at a time. Basically, pg_autovacuum issues a vacuum on a table and waits for it to finish, then check the next table in it's list to see if it needs to be vacuumed, if so, it does it and waits for that vacuum to finish. There was some discussion of issuing concurrent vacuum against different tables, but it was decided that since vacuum is I/O bound, it would only make sense to issue concurrent vacuums that were on different spindles, which is not something I wanted to get into. Also, given the recent talk about how vacuum is still such a performance hog, I can't imagine what multiple concurrent vacuums would do to performance. Maybe as 7.5 develops and many of the vacuum performance issues are addressed, we can revisit this question. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Matthew, Basically, I don't like the idea of modifying users databases, besides, in the long run most of what needs to be tracked will be moved to the system catalogs. I kind of consider the pg_autvacuum database to equivalent to the changes that will need to be made to the system catalogs. OK. As I said, I don't feel strongly about it. I certainly agree that less than 10% would be excessive, I still feel that 10% may not be high enough though. That's why I kinda liked the sliding scale I mentioned earlier, because I agree that for very large tables, something as low as 10% might be useful, but most tables in a database would not be that large. Yes, but I thought that we were taking care of that through the threshold value? A sliding scale would also be OK. However, that would definitely require a leap to storing per-table pg_avd statistics and settings. Only that pg_autovacuum isn't smart enough to kick off more than one vacuum at a time. Basically, pg_autovacuum issues a vacuum on a table and waits for it to finish, then check the next table in it's list to see if it needs to be vacuumed, if so, it does it and waits for that vacuum to finish. OK, then, we just need to detect the condition of the vacuums piling up because they are happening too often. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Josh Berkus wrote: Matthew, I certainly agree that less than 10% would be excessive, I still feel that 10% may not be high enough though. That's why I kinda liked the sliding scale I mentioned earlier, because I agree that for very large tables, something as low as 10% might be useful, but most tables in a database would not be that large. Yes, but I thought that we were taking care of that through the threshold value? Well the threshold is a combination of the base value and the scaling factor which you are proposing is 0.1, so the threshold is base + (scaling factor)(num of tuples) So with the default base of 1000 and your 0.1 you would have this: Num Rowsthreshold Percent 1,0001,100 110% 10,0002,000 20% 100,000 11,000 11% 1,000,000 102,000 10% I don't like how that looks, hence the thought of some non-linear scaling factor that would still allow the percent to reach 10%, but at a slower rate, perhaps just a larger base value would suffice, but I think small table performance is going to suffer much above 1000. Anyone else have an opinion on the table above? Good / Bad / Indifferent? A sliding scale would also be OK. However, that would definitely require a leap to storing per-table pg_avd statistics and settings. I don't think it would, it would correlate the scaling factor with the number of tuples, no per-table settings required. Only that pg_autovacuum isn't smart enough to kick off more than one vacuum at a time. Basically, pg_autovacuum issues a vacuum on a table and waits for it to finish, then check the next table in it's list to see if it needs to be vacuumed, if so, it does it and waits for that vacuum to finish. OK, then, we just need to detect the condition of the vacuums piling up because they are happening too often. That would be good to look into at some point, especially if vacuum is going to get slower as a result of the page loop delay patch that has been floating around. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org