Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
On Sat, 21 Jul 2007, Tom Lane wrote: With autovac off I see 8.3 as faster than 8.2 in pgbench. Indeed. I'm seeing much better pgbench results from HEAD than 8.2 when I set the configurations up identically. I'm hoping to have a comparison set to show everyone this week. and use -t at least 1000 or so (otherwise startup transients are significant). I personally consider any pgbench run that lasts less than several minutes noise. On a system that hits 500 TPS like Pavel's, I'd want to see around 100,000 transactions before I consider the results significant. And then I'd want a set of 3 at each configuration because even with longer runs, you occasionally get really odd results. Until you have 3 it can be unclear which is the weird one. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
Hello, I checked my tests again I have different results. Now I tested PostgreSQL on dedicated server. Now 8.3 is about 20% faster. I didn't see strong impression of autovacuum. All numbers are approximate only. I did pgbench 3x for folowing configuration: (autovacuum on, autovacuum off, statistics off) and for -tntransaction (100, 1000, 4000) -t 100 629/638/639 630/630/646 581/654/656 -- 8.3 443/519/519 542/555/566 670/692/694 -- 8.2 -t 1000 622/626/653 630/635/653 631/631/652 -- 8.3 523/528/541 522/535/550 679/681/690 -- 8.2 -t 4000 632/635/644 248/385/651* 191/401/641* -- 8.3 465/472/520 237/336/538* 249/379/702* -- 8.2 8.2 is faster only if statistics are off Note: *I didn't use parametr v (do_vacuum_accounts) I am sorry for noise Regards Pavel Stehule ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
Pavel Stehule wrote: Hello, I checked my tests again I have different results. Now I tested PostgreSQL on dedicated server. Now 8.3 is about 20% faster. I didn't see strong impression of autovacuum. All numbers are approximate only. I did pgbench 3x for folowing configuration: (autovacuum on, autovacuum off, statistics off) and for -tntransaction (100, 1000, 4000) In other news, 8.3 with current HOT is 13% faster than 8.2 at TPCE in the first 1/2 hour. Performance does not fall over 5 hours of test run, and most of the main tables never have autovacuum triggered at all. Unfortnately, we don't yet have a 5-hour 8.2 run to compare last-half-hour performance. --Josh ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
Tom, Note to all: we ***HAVE TO*** settle on some reasonable default vacuum_cost_delay settings before we can ship 8.3. With no cost delay and two or three workers active, 8.3's autovac does indeed send performance into the tank. I've been using 20ms for most of my setups. That's aimed at reducing autovac to almost no impact at all, but taking a long time. Maybe 10ms? --Josh ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Full page images in WAL Cache Invalidation
Hi I'm currently working on correctly flushing the catalog/relation/sgmr caches on a readonly PITR slave during recovery. These are the things that currently cause me headache. 1) It seems that the btree code sends out relcache invalidation msgs during normal operation (No DDL statements are executed). This lets any simple flush-all-caches-if-ddl-was-execute scheme fail. 2) When a full page image is written to the wal, the information about what tuple was updated is lost. So synthesizing cache invalidation msgs from the WAL records would need to reverseengineer a full page image, which seems hard and errorprone. 3) Most cache invalidations seem to be generated by heap_insert (via PrepareForTupleInvalidation). Those seems to be reconstructable from the WAL quite easily. Those sent out via CacheInvalidateRelcache*, however, seem to leave no trace in the WAL. What I'm wondering is how much performance is lost if I just let the slave flush all it's caches whenever it replayed a commit record of a transaction that executed DDL. To me it looks like that would only seriously harm performance if a lot of temporary tables are created on the master. Since there seem to be quite people who are unhappiy about the current temptable implementation, optimizing for that case might prove worthless if 8.4 or 8.5 will change the way that temptables are handled. If this brute-force approach turns out to perform really bad, does anyone see an elegant way around (2) and (3)? (2) seems solveable by writing logical and physical records to the wal - similar to what that xlog compression idea needs (I, however, lost track of what came out of that discussion). But (3) seems to be messy.. greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Full page images in WAL Cache Invalidation
Florian G. Pflug [EMAIL PROTECTED] writes: I'm currently working on correctly flushing the catalog/relation/sgmr caches on a readonly PITR slave during recovery. I don't believe there is any workable solution to that short of logging cache-flush operations in WAL. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Full page images in WAL Cache Invalidation
Florian G. Pflug [EMAIL PROTECTED] writes: I'm currently working on correctly flushing the catalog/relation/sgmr caches on a readonly PITR slave during recovery. I don't believe there is any workable solution to that short of logging cache-flush operations in WAL. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
Greg Smith [EMAIL PROTECTED] writes: I'd want a set of 3 at each configuration because even with longer runs, you occasionally get really odd results. Until you have 3 it can be unclear which is the weird one. Yeah, pgbench results are notoriously unrepeatable. One issue is that the first run after pgbench -i sees conditions a lot different from subsequent runs (no free space in tables, indexes are more tightly packed than they will be later, etc). The recently added option to initialize the tables with a selected fillfactor might help here, but personally I've not experimented with it. There seems to be also some of the good average but bad worst case behavior that Josh and others have pointed out in bigger benchmarks. I've always assumed this was due to checkpointing (and autovac if enabled). If your test run isn't long enough to cover one full checkpoint cycle then the results will be quite variable depending on whether it included a checkpoint or not. 8.3 might alleviate this effect to some extent. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Full page images in WAL Cache Invalidation
Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: I'm currently working on correctly flushing the catalog/relation/sgmr caches on a readonly PITR slave during recovery. I don't believe there is any workable solution to that short of logging cache-flush operations in WAL. I still don't fully understand if and what problems are caused by overly aggresive cache flushing - what puzzles me is that DDL statements seems to be considered something rare on a production system by most people on this list, yet the caches seem to be highly optimized to avoid unnecessary invalidates. Leaving aside the btree issues, are you worried about performance problems, or can aggressive cache flushing hurt correctness? The reason that I dislike WAL-logging of the flush operations so much is that it since peopel are concerned about the amount of wal traffic postgres generated, such a solution would introduce yet another GUC. And to make this reasonable foolproof, the slave would need a way to detect if that GUC is set correctly on the master. All in all, that seems to be quite hackish... greetings, Florian Pflug ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Full page images in WAL Cache Invalidation
Florian G. Pflug [EMAIL PROTECTED] writes: Leaving aside the btree issues, are you worried about performance problems, or can aggressive cache flushing hurt correctness? It shouldn't hurt correctness, but I don't think you entirely grasp the magnitude of the performance hit you'll take. The last time I tried running the regression tests with CLOBBER_CACHE_ALWAYS enabled, they took about one hundred times longer than normal. Now you are evidently hoping not to disable the caches entirely, but I don't understand how you are going to handle the various random CacheInvalidateRelcache calls that are here and there in the system (not only in btree). regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Full page images in WAL Cache Invalidation
Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: Leaving aside the btree issues, are you worried about performance problems, or can aggressive cache flushing hurt correctness? It shouldn't hurt correctness, but I don't think you entirely grasp the magnitude of the performance hit you'll take. The last time I tried running the regression tests with CLOBBER_CACHE_ALWAYS enabled, they took about one hundred times longer than normal. Now you are evidently hoping not to disable the caches entirely, but I don't understand how you are going to handle the various random CacheInvalidateRelcache calls that are here and there in the system (not only in btree). I must be missing something... A quick grep for CacheInvalidateRelcache turned of these places: src/backend/rewrite/rewriteDefine.c (EnableDisableRule) src/backend/rewrite/rewriteSupport.c (SetRelationRuleStatus) src/backend/access/nbtree/nbtinsert.c (_bt_insertonpg, _bt_newroot) src/backend/access/nbtree/nbtpage.c (_bt_getroot, _bt_pagedel) src/backend/commands/trigger.c (renametrig, EnableDisableTrigger) src/backend/commands/cluster.c (mark_index_clustered) src/backend/commands/indexcmds.c (DefineIndex) src/backend/commands/tablecmds.c (setRelhassubclassInRelation) src/backend/commands/vacuum.c (vac_update_relstats) src/backend/catalog/heap.c (SetRelationNumChecks) src/backend/catalog/index.c (index_drop, index_update_stats) For CacheInvalidateHeapTuple, there is an additional hit in src/backend/commands/vacuum.c (move_chain_tuple, move_plain_tuple). Note that move_chain_tuple and move_plain_tuple are only called in repair_frag, which in turn is only used in full_vacuum_rel. Now, to me all of these with the exception of the btree functions, vac_update_relstats and move_chain/plain_tuple look like they are only called during DDL statements. My basic assumption is that DDL is something quite uncommon on a production system. This is obviously *totally* *wrong* for the regression tests, and I don't doubt that my scheme will show quite bad performance if you use that as a benchmark. But if you, say, use pgbench for testing, than the picture will be quite different I imagine. My strategy would be the following 1) Mark the commit record if a transaction generated any invalidation events apart from the btree ones. The only other source of inval events seems to be VACUUM FULL on a system relation, which won't happen on a modestly well-tuned system I think - any VACCUM FULL will need a special treatement anyway. 2) At replay time, the caches are flushed after that record was replayed. greetings, Florian Pflug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Full page images in WAL Cache Invalidation
Florian G. Pflug [EMAIL PROTECTED] writes: My basic assumption is that DDL is something quite uncommon on a production system. I'm not sure I believe that, because of temp tables. There's also the problem that plain VACUUM (or ANALYZE) causes a relcache flush to update the relation-size statistics. The real problem with the scheme you propose is that it turns a cache flush on one table into a system-wide cache flush. We might be able to do something about the temp-table case upstream: AFAICS there's no reason for backends to broadcast cache flushes for their own temp tables to other backends. But that's just a sketch of a thought at the moment. Anyway, if you believe that DDL is infrequent, why are you resistant to the idea of WAL-logging cache flushes? regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3
On Sun, 22 Jul 2007, Tom Lane wrote: There seems to be also some of the good average but bad worst case behavior that Josh and others have pointed out in bigger benchmarks. I've always assumed this was due to checkpointing At lot of variation is from checkpoints, some comes from the index fullness/fragmentation, some from vacuuming, a bit is added based on what's in the buffer cache from previous runs, and there's a touch of randomness thrown on top even if you control all those things. My tests suggest the bad worst case results in pgbench tests are almost always from checkpoints that happen when the buffer cache is filled with almost exclusively dirty buffers (which can happen very easily with pgbench). I am in the process of squashing these issues when running pgbench against 8.3. A before/after look at pg_stat_bgwriter lets you measure the checkpoint variation. The ability to directly play with the index fillfactor lets you test/control the impact of that. Wrap all that into a scripting framework that runs the tests many times for you in a consistant fashion and then summarizes the results, and pgbench becomes a rough but completely servicable tool. Expect to see a pgbench-tools project that does all that from me soon, my copy works but it's not ready for public consumption yet. I've also got a working backport of all the pg_stat_bgwriter instrumenting that applies to 8.2, so people can run all this against the current version as well if they're in a position where they can hack a custom build. The hold-up on that is that the code that measures buffers allocated and those written by back-ends for 8.3 is wrapped into the Automatic adjustment of bgwriter_lru_maxpages patch, and I'm not going to finalize my unofficial backport until that gets applied. Once HOT wraps up that loose end should get snipped easily enough. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Full page images in WAL Cache Invalidation
Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: My basic assumption is that DDL is something quite uncommon on a production system. I'm not sure I believe that, because of temp tables. There's also the problem that plain VACUUM (or ANALYZE) causes a relcache flush to update the relation-size statistics. The real problem with the scheme you propose is that it turns a cache flush on one table into a system-wide cache flush. Yes.. It really builds on the idea that those flushes happen not too frequently. We might be able to do something about the temp-table case upstream: AFAICS there's no reason for backends to broadcast cache flushes for their own temp tables to other backends. But that's just a sketch of a thought at the moment. I was actually hoping that some day temptables wouldn't be stored in the pg_class and friends at all. I was actually wondering if it wouldn't be possible to keep the information about them soley in the catcache and relcache (Making the word cache a big lie). Didn't check if that is feasible at all, though - just an idea I got at one point. Anyway, if you believe that DDL is infrequent, why are you resistant to the idea of WAL-logging cache flushes? For multiple reasons. First, cache invalidations are not the only problem caused by replaying system-table updates. The whole SnapshotNow business doesn't exactly make things easier too. So it feels like a lot of added complexity and code for little gain - unless a *lot* more things (like locking requests) are logged too. Second, I'm sure that people would insist on a GUC to turn logging those records off if they don't need them in their setup. Similar to that make_wal_compressable GUC that was proposed a few weeks ago. And if it's a GUC, the slave should be able to verify that it was set correctly on the master, otherwise this becomes a huge footgun. Third, I try to keep the changes necessary on the master at a minimum - I feel that this will make merging the code at some point easier, because the risk of breaking something is smaller. Bugs in the slave code will maybe cause crashes and wrong results, but at least they won't cause data corruption on the master. And last but not least, I have only limited time for this project - so I try to find the simplest workable solution, and maybe tune things later when pratical experience shows where the real bottlenecks are. greetings, Florian Pflug ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Full page images in WAL Cache Invalidation
Florian G. Pflug [EMAIL PROTECTED] writes: Tom Lane wrote: The real problem with the scheme you propose is that it turns a cache flush on one table into a system-wide cache flush. Yes.. It really builds on the idea that those flushes happen not too frequently. The problem is that flushes may be infrequent for any one table, but if you use an implementation that converts every per-table flush to a system-wide flush, it's not so infrequent anymore. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Full page images in WAL Cache Invalidation
Florian G. Pflug [EMAIL PROTECTED] writes: Anyway, if you believe that DDL is infrequent, why are you resistant to the idea of WAL-logging cache flushes? First, cache invalidations are not the only problem caused by replaying system-table updates. The whole SnapshotNow business doesn't exactly make things easier too. So it feels like a lot of added complexity and code for little gain - unless a *lot* more things (like locking requests) are logged too. The mention of locking requests brought to mind the following gedankenexperiment: 1. slave server backend is running some long-running query on table X. 2. WAL-reading process receives and executes DROP TABLE X. (It doesn't even have to be a DROP; most varieties of ALTER are enough to create problems for a concurrently-running query.) It's really hard to see how to defend against that without a fairly complete simulation of locking on the slave side. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Full page images in WAL Cache Invalidation
On Sun, 2007-07-22 at 19:58 +0200, Florian G. Pflug wrote: Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: I'm currently working on correctly flushing the catalog/relation/sgmr caches on a readonly PITR slave during recovery. I don't believe there is any workable solution to that short of logging cache-flush operations in WAL. The reason that I dislike WAL-logging of the flush operations so much is that it since peopel are concerned about the amount of wal traffic postgres generated, such a solution would introduce yet another GUC. And to make this reasonable foolproof, the slave would need a way to detect if that GUC is set correctly on the master. All in all, that seems to be quite hackish... Seems like we should WAL log flush operations first. It's fairly straightforward to do that and we can then measure its effect on the primary easily enough. Your other suggestions seem much more complex. I think we have a reasonable tolerance for increases in WAL and as you said earlier, we may balance that out with other optimisations. Or we may find a more efficient way of doing it later. Let's aim to get that first query running, then go back and tune it later. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Full page images in WAL Cache Invalidation
On Sun, 2007-07-22 at 17:54 +0200, Florian G. Pflug wrote: 1) It seems that the btree code sends out relcache invalidation msgs during normal operation (No DDL statements are executed). This lets any simple flush-all-caches-if-ddl-was-execute scheme fail. Cache invalidation is uniquely associated with these WAL record types: XLOG_BTREE_NEWROOT XLOG_BTREE_INSERT_META XLOG_BTREE_DELETE_PAGE_META None of those things happen with any real frequency in most use cases. We don't increase or reduce the number of levels of an index very frequently. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Full page images in WAL Cache Invalidation
Tom Lane [EMAIL PROTECTED] writes: 2. WAL-reading process receives and executes DROP TABLE X. (It doesn't even have to be a DROP; most varieties of ALTER are enough to create problems for a concurrently-running query.) It's really hard to see how to defend against that without a fairly complete simulation of locking on the slave side. Well, it's not hard to see ways to do it. It may be hard to see *good* ways to do it. In the limit you can just block WAL replay of such records (I think vacuum page defragmenting is also in this category) until there are no queries running. This means a single long-running query can block log replay for a long time though. There's an additional problem from commands like create index concurrently and vacuum's deleting of dead tuples which aren't safe to execute if there are any standby queries running unless we can prove they're not too old. CLUSTER and most varieties of ALTER suffer from this problem as well. It does seem like the shortest path solution is just to get things working first with a one big lock implementation and then work on refining it after that. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] syslogging oddity
Somewhere along the way we seem to have made the syslogger's shutdown message go to stderr, even if we have redirected it: [EMAIL PROTECTED] inst.test.5703]$ bin/pg_ctl -D data/ -w stop waiting for server to shut downLOG: logger shutting down done server stopped Not sure if this is something we should worry about. I don't recall it happening previously. cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] syslogging oddity
Andrew Dunstan [EMAIL PROTECTED] writes: Somewhere along the way we seem to have made the syslogger's shutdown message go to stderr, even if we have redirected it: I'm pretty sure it has done that all along; at least the design intention is that messages generated by syslogger itself should go to its stderr. (Else, if the logger is having trouble, you might never get to find out why at all.) It might be reasonable to reduce logger shutting down to DEBUG1 or so, now that the facility has been around for awhile. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate