Re: [PATCHES] Load Distributed Checkpoints, take 3
Tom Lane wrote: Hmm. But if we're going to do that, we might as well have a checkpoint for our troubles, no? The reason for the current design is the assumption that a bgwriter_all scan is less burdensome than a checkpoint, but that is no longer true given this rewrite. Per comments in CreateCheckPoint, another also skip the extra checkpoints to avoid writing two checkpoints to the same page, risking losing both on a crash: * If this isn't a shutdown or forced checkpoint, and we have not inserted * any XLOG records since the start of the last checkpoint, skip the * checkpoint. The idea here is to avoid inserting duplicate checkpoints * when the system is idle. That wastes log space, and more importantly it * exposes us to possible loss of both current and previous checkpoint * records if the machine crashes just as we're writing the update. * (Perhaps it'd make even more sense to checkpoint only when the previous * checkpoint record is in a different xlog page?) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PATCHES] pgstat_drop_relation bugfix
I wrote: pgstat_drop_relation() is expecting relid (pg_class.oid) as the argument, but we pass it relfilenode. I'm trying to fix the bug, because there is a possibility that some useless statistics data continue to occupy some parts of the statistics table. Here is a patch to fix undropped entries in the runtime statistics table. Now smgr records the relation oids and uses them to drop entries corresponding the relations. We could make stray entries easily using the following queries. CREATE TABLE test (i integer); INSERT INTO test VALUES(1); TRUNCATE test; DROP TABLE test; Since we don't discard any of stat entries except pg_stat_reset(), those useless entries would cause memory leaks, though it is very trivial. I fell my fix is uncool; Let me know if there are any other better ways. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center pgstat_drop_relation.patch Description: Binary data ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] msvc and vista fun
On Mon, Jun 25, 2007 at 03:19:32PM -0400, Andrew Dunstan wrote: I wrote: Would making a change like this in those 12 places be so ugly? Specifically, I propose the following patch, which should fix the issues buildfarm apparently has with the XP command shell (or some incarnations of it). Yeah. Dave has verified taht it's actually because of the XP stuff (he can repro the failure on XP), so I guess we don't have much choice. Strictly speaking, we don't need the changes in builddoc.bat and install.bat, but I did them for the sake of completeness. Since it should not affect anyone not setting XP_EXIT_FIX it seems fairly low risk. I've applied it, except for builddoc.bat. Since we didn't send out an exitcode anyway there, it makes no difference. Left them in install.bat because there's an actual exitcode there. //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] Load Distributed Checkpoints, take 3
Tom Lane wrote: Greg Smith [EMAIL PROTECTED] writes: The way transitions between completely idle and all-out bursts happen were one problematic area I struggled with. Since the LRU point doesn't move during the idle parts, and the lingering buffers have a usage_count0, the LRU scan won't touch them; the only way to clear out a bunch of dirty buffers leftover from the last burst is with the all-scan. One thing that might be worth changing is that right now, BgBufferSync starts over from the current clock-sweep point on each call --- that is, each bgwriter cycle. So it can't really be made to write very many buffers without excessive CPU work. Maybe we should redefine it to have some static state carried across bgwriter cycles, such that it would write at most N dirty buffers per call, but scan through X percent of the buffers, possibly across several calls, before returning to the (by now probably advanced) clock-sweep point. This would allow a larger value of X to be used than is currently practical. You might wish to recheck the clock sweep point on each iteration just to make sure the scan hasn't fallen behind it, but otherwise I don't see any downside. The scenario where somebody re-dirties a buffer that was cleaned by the bgwriter scan isn't a problem, because that buffer will also have had its usage_count increased and thereby not be a candidate for replacement. Something along those lines could be useful. I've thought of that before, but it never occured to me that if a page in front of the clock hand is re-dirtied, it's no longer a candidate for replacement anyway... I'm going to leave the all- and lru- bgwriter scans alone for now to get this LDC patch finished. We still have the bgwriter autotuning patch in the queue. Let's think about this in the context of that patch. As a general comment on this subject, a lot of the work in LDC presumes you have an accurate notion of how close the next checkpoint is. Yeah; this is one reason I was interested in carrying some write-speed state across checkpoints instead of having the calculation start from scratch each time. That wouldn't help systems that sit idle a long time and suddenly go nuts, but it seems to me that smoothing the write rate across more than one checkpoint could help if the fluctuations occur over a timescale of a few checkpoints. Hmm. This problem only applies to checkpoints triggered by checkpoint_segments; time tends to move forward at a constant rate. I'm not worried about small fluctuations or bursts. As I argued earlier, the OS still buffers the writes and should give some extra smoothing of the physical writes. I believe bursts of say 50-100 MB would easily fit in OS cache, as long as there's enough gap between them. I haven't tested that, though. Here's a proposal for an algorithm to smooth bigger bursts: The basic design is the same as before. We keep track of elapsed time and elapsed xlogs, and based on them we estimate how much progress in flushing the buffers we should've made by now, and then we catch up until we reach that. The estimate for time is the same. The estimate for xlogs gets more complicated: Let's have a few definitions first: Ro = elapsed segments / elapsed time, from previous checkpoint cycle. For example, 1.25 means that the checkpoint was triggered by checkpoint_segments, and we had spent 1/1.25 = 80% of checkpoint_timeout when we reached checkpoint_segments. 0.25 would mean that checkpoint was triggered by checkpoint_timeout, and we had spent 25% of checkpoint_segments by then. Rn = elapsed segments / elapsed time this far from current in-progress checkpoint. t = elapsed time, as a fraction of checkpoint_timeout (0.0 - 1.0, though could be 1 if next checkpoint is already due) s = elapsed xlog segments, as a fraction of checkpoint_segments (0.0 - 1.0, though could again be 1 if next checkpoint is already due) R = estimate for WAL segment consumption rate, as checkpoint_segments / checkpoint_timeout R = Ro * t + Rn * (1 - t) In other words, at the beginning of the checkpoint, we give more weight to the state carried over from previous checkpoint. As we go forward, more weight is given to the rate calculated from current cycle. From R, we extrapolate how much progress we should've done by now: Target progress = R * t This would require saving just one number from previous cycle (Rn), and there is no requirement to call the estimation function at steady time intervals, for example. It gives pretty steady I/O rate even if there's big bursts in WAL activity, but still reacts to changes in the rate. I'm not convinced this is worth the effort, though. First of all, this is only a problem if you use checkpoint_segments to control your checkpoints, so you can lower checkpoint_timeout to do more work during the idle periods. Secondly, with the optimization of not flushing buffers during checkpoint that were dirtied after the start of
Re: [PATCHES] Load Distributed Checkpoints, take 3
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: Hmm. But if we're going to do that, we might as well have a checkpoint for our troubles, no? The reason for the current design is the assumption that a bgwriter_all scan is less burdensome than a checkpoint, but that is no longer true given this rewrite. Per comments in CreateCheckPoint, another also skip the extra checkpoints to avoid writing two checkpoints to the same page, risking losing both on a crash: That's not the reason for the current design --- that logic existed long before bgwriter did. Anyway, if there are no XLOG records since the last checkpoint, there's probably nothing in shared buffers that needs flushing. There might be some dirty hint-bits, but the only reason to push those out is to make some free buffers available, and doing that is not checkpoint's job (nor the all-buffers scan's job); that's what the LRU scan is for. The only reason that the all-buffers scan was put in was to try to make the next checkpoint cheaper. That reason falls to the ground with LDC. What we have left is merely excess I/O. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Load Distributed Checkpoints, take 3
Tom Lane wrote: Anyway, if there are no XLOG records since the last checkpoint, there's probably nothing in shared buffers that needs flushing. There might be some dirty hint-bits, but the only reason to push those out is to make some free buffers available, and doing that is not checkpoint's job (nor the all-buffers scan's job); that's what the LRU scan is for. Yeah, except the LRU scan is not doing a very good job at that. It will ignore buffers with usage_count 0, and it only scans bgwriter_lru_percent buffers ahead of the clock hand. One pathological case is a COPY of a table slightly smaller than shared_buffers. That will fill the buffer cache. If you then have a checkpoint, and after that a SELECT COUNT(*), or a VACUUM, the buffer cache will be full of pages with just hint-bit-updates, but no WAL activity since last checkpoint. But let's fix the LRU scan, rather work around it's deficiencies. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] Load Distributed Checkpoints, take 3
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: ... that's what the LRU scan is for. Yeah, except the LRU scan is not doing a very good job at that. It will ignore buffers with usage_count 0, and it only scans bgwriter_lru_percent buffers ahead of the clock hand. Which is exactly in sync with which buffers are actually candidates for replacement. It could be looking further ahead of the clock hand, as per my previous suggestion, but there is no need to push out buffers with usage_count 0 until after the sweep has decremented that to 0. Doing so would tend to create excess I/O --- it makes it more likely that multiple page-dirtying events on a page will result in separate writes instead of a single write. One pathological case is a COPY of a table slightly smaller than shared_buffers. That will fill the buffer cache. If you then have a checkpoint, and after that a SELECT COUNT(*), or a VACUUM, the buffer cache will be full of pages with just hint-bit-updates, but no WAL activity since last checkpoint. This argument supposes that the bgwriter will do nothing while the COPY is proceeding. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Load Distributed Checkpoints, take 3
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: One pathological case is a COPY of a table slightly smaller than shared_buffers. That will fill the buffer cache. If you then have a checkpoint, and after that a SELECT COUNT(*), or a VACUUM, the buffer cache will be full of pages with just hint-bit-updates, but no WAL activity since last checkpoint. This argument supposes that the bgwriter will do nothing while the COPY is proceeding. It will clean buffers ahead of the COPY, but it won't write the buffers COPY leaves behind since they have usage_count=1. Let me demonstrate this with an imaginary example with shared_buffers=4: buf_id usage_count dirty 1 0 f 2 0 f 3 0 f 4 0 f After COPY buf_id usage_count dirty 1 1 t 2 1 t 3 1 t 4 1 t CHECKPOINT: buf_id usage_count dirty 1 1 f 2 1 f 3 1 f 4 1 f VACUUM: buf_id usage_count dirty 1 1 t 2 1 t 3 1 t 4 1 t As soon as a backend asks for a buffer, the situation is defused as the backend will do a full clock sweep and decrement the usage_count of each buffer to 0, letting the bgwriter lru-scan to clean them. Having the buffer cache full of dirty buffers is not a problem on its own, so this only becomes a performance issue if you then issue another large COPY etc. that needs those buffers, and you now have to write them at the busy time. This is a corner case that might not be worth worrying about. It's also mitigated by the fact that the OS cache is most likely clean after a period of idle time, and should be able to absorb the write burst. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Load Distributed Checkpoints, take 3
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: This argument supposes that the bgwriter will do nothing while the COPY is proceeding. It will clean buffers ahead of the COPY, but it won't write the buffers COPY leaves behind since they have usage_count=1. Yeah, and they don't *need* to be written until the clock sweep has passed over them once. I'm not impressed with the idea of writing buffers because we might need them someday; that just costs extra I/O due to re-dirtying in too many scenarios. (Note that COPY per se will not trigger this behavior anyway, since it will act in a limited number of buffers because of the recent buffer access strategy patch.) regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] Load Distributed Checkpoints, take 3
Tom Lane wrote: (Note that COPY per se will not trigger this behavior anyway, since it will act in a limited number of buffers because of the recent buffer access strategy patch.) Actually we dropped it from COPY, because it didn't seem to improve performance in the tests we ran. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Load Distributed Checkpoints, take 3
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: (Note that COPY per se will not trigger this behavior anyway, since it will act in a limited number of buffers because of the recent buffer access strategy patch.) Actually we dropped it from COPY, because it didn't seem to improve performance in the tests we ran. Who's we? AFAICS, CVS HEAD will treat a large copy the same as any other large heapscan. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Load Distributed Checkpoints, take 3
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: (Note that COPY per se will not trigger this behavior anyway, since it will act in a limited number of buffers because of the recent buffer access strategy patch.) Actually we dropped it from COPY, because it didn't seem to improve performance in the tests we ran. Who's we? AFAICS, CVS HEAD will treat a large copy the same as any other large heapscan. Umm, I'm talking about populating a table with COPY *FROM*. That's not a heap scan at all. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Load Distributed Checkpoints, take 3
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: Who's we? AFAICS, CVS HEAD will treat a large copy the same as any other large heapscan. Umm, I'm talking about populating a table with COPY *FROM*. That's not a heap scan at all. No wonder we're failing to communicate. I assumed you were talking about copy-to-file. Copy-from-file is going to be creating WAL entries hence the no-checkpoint case doesn't apply anyway, no? [ thinks ... ] Oh, you must be positing the case where the recently added skip-WAL-if-table-is-new-in-this-transaction optimization applies. Well, that thing could probably do with some more work anyway (I wonder why it's using shared buffers at all anymore). I don't really think that case should be allowed to drive our thinking about how the bgwriter should work. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Load Distributed Checkpoints, take 3
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: Who's we? AFAICS, CVS HEAD will treat a large copy the same as any other large heapscan. Umm, I'm talking about populating a table with COPY *FROM*. That's not a heap scan at all. No wonder we're failing to communicate. I assumed you were talking about copy-to-file. Copy-from-file is going to be creating WAL entries hence the no-checkpoint case doesn't apply anyway, no? We are indeed having trouble to communicate :(. No, I'm not talking about the new non-WAL-logged COPY optimization. COPY FROM *would* create WAL entries, and the next checkpoint would clean them. So far so good. But then you run VACUUM, as you often do after loading a table, to set all hint bits. That will *not* generate WAL, and next checkpoint is skipped. To recap, the sequence is: 1. COPY FROM 2. checkpoint 3. VACUUM Now you have buffer cache full of dirty buffers with usage_count=1, and no WAL activity since last checkpoint. They will not be flushed until: a) WAL activity happens and next checkpoint comes b) database is shut down, or manual CHECKPOINT is issued c) clock hand advances and decrements the usage_counts It's a corner case. Probably not a problem in practice; you usually run CREATE INDEX after loading a table, for example. But it exists. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] Load Distributed Checkpoints, take 3
On Mon, 25 Jun 2007, Tom Lane wrote: right now, BgBufferSync starts over from the current clock-sweep point on each call --- that is, each bgwriter cycle. So it can't really be made to write very many buffers without excessive CPU work. Maybe we should redefine it to have some static state carried across bgwriter cycles The LRU portion restarts like that, and it's certainly not optimal. But the auto-tuning LRU patch that's already near application makes this much less of an issue because it only does real work when buffers have been allocated, so the sweep point will have moved along. I'll add this idea to my list of things that would be nice to have as part of a larger rewriter, I think it's more trouble than it's worth to chase right now. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Load Distributed Checkpoints, take 3
Heikki Linnakangas [EMAIL PROTECTED] writes: To recap, the sequence is: 1. COPY FROM 2. checkpoint 3. VACUUM Now you have buffer cache full of dirty buffers with usage_count=1, Well, it won't be very full, because VACUUM works in a limited number of buffers (and did even before the BufferAccessStrategy patch). I have no doubt that there are scenarios such as you are thinking about, but it definitely seems like a corner case that doesn't justify keeping the all-buffers scan. That scan is costing us extra I/O in ordinary non-corner cases, so it's not free to keep it. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Load Distributed Checkpoints, take 3
On Tue, 26 Jun 2007, Tom Lane wrote: I have no doubt that there are scenarios such as you are thinking about, but it definitely seems like a corner case that doesn't justify keeping the all-buffers scan. That scan is costing us extra I/O in ordinary non-corner cases, so it's not free to keep it. And scenarios I'm concerned about but can't diagram as easily fall into this category as well. I agree that a LDC enabled config would ship with the all-buffers scan turned off as redundant and wasteful, in which the only cost to keep it is code baggage. But the fact that there are corner cases floating around this area is what makes me feel that removing it altogether is still a bit premature. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Load Distributed Checkpoints, final patch
On Jun 26, 2007, at 13:49 , Heikki Linnakangas wrote: Maximum is 0.9, to leave some headroom for fsync and any other things that need to happen during a checkpoint. I think it might be more user-friendly to make the maximum 1 (meaning as much smoothing as you could possibly get) and internally reserve a certain amount off for whatever headroom might be required. It's more common for users to see a value range from 0 to 1 rather than 0 to 0.9. Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Load Distributed Checkpoints, final patch
Heikki Linnakangas [EMAIL PROTECTED] writes: Barring any objections from committer, I'm finished with this patch. Sounds great, I'll start looking this over. I'm scheduling more DBT-2 tests at a high # of warehouses per Greg Smith's suggestion just to see what happens, but I doubt that will change my mind on the above decisions. When do you expect to have those results? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Load Distributed Checkpoints, final patch
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Barring any objections from committer, I'm finished with this patch. Sounds great, I'll start looking this over. I'm scheduling more DBT-2 tests at a high # of warehouses per Greg Smith's suggestion just to see what happens, but I doubt that will change my mind on the above decisions. When do you expect to have those results? In a few days. I'm doing long tests because the variability in the 1h tests was very high. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Load Distributed Checkpoints, final patch
Michael Glaesemann wrote: On Jun 26, 2007, at 13:49 , Heikki Linnakangas wrote: Maximum is 0.9, to leave some headroom for fsync and any other things that need to happen during a checkpoint. I think it might be more user-friendly to make the maximum 1 (meaning as much smoothing as you could possibly get) and internally reserve a certain amount off for whatever headroom might be required. It's more common for users to see a value range from 0 to 1 rather than 0 to 0.9. It would then be counter-intuitive if you set it to 1.0, and see that your checkpoints consistently take 90% of the checkpoint interval. We could just allow any value up to 1.0, and note in the docs that you should leave some headroom, unless you don't mind starting the next checkpoint a bit late. That actually sounds pretty good. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PATCHES] pg_ctl -w (wait) option on Windows
The attached patch implements the following fixes to pg_ctl on Windows: - Fix the -w (wait) option to work in Windows service mode, per bug #3382. This is required on Windows because pg_ctl reports running status to the service control manager when actually still in recovery/startup, causing any dependent services to start too early. - Prevent the -w option being passed to the postmaster. - Read the postmaster options file when starting as a Windows service. As a side note, I noticed whilst testing this that the -w option relies on libpq to provide the username with which to connect to the server to test the connection. This will fail if that user is not the one that initialized the cluster, or if the superuser name has been changed. I hit the former case because I initdb'ed in my dev environment, and then setup the server to run as a service under a dedicated user account. I realise this won't affect normal users, and falls neatly into the 'don't do that then' category, but I thought I'd mention it :-) Regards, Dave Index: pg_ctl.c === RCS file: /projects/cvsroot/pgsql/src/bin/pg_ctl/pg_ctl.c,v retrieving revision 1.80 diff -c -r1.80 pg_ctl.c *** pg_ctl.c31 May 2007 15:13:04 - 1.80 --- pg_ctl.c26 Jun 2007 19:24:06 - *** *** 126,136 static void WINAPI pgwin32_ServiceMain(DWORD, LPTSTR *); static void pgwin32_doRunAsService(void); static intCreateRestrictedProcess(char *cmd, PROCESS_INFORMATION * processInfo); #endif static pgpid_t get_pgpid(void); static char **readfile(const char *path); ! static intstart_postmaster(void); ! static bool test_postmaster_connection(void); static bool postmaster_is_alive(pid_t pid); static char def_postopts_file[MAXPGPATH]; --- 126,147 static void WINAPI pgwin32_ServiceMain(DWORD, LPTSTR *); static void pgwin32_doRunAsService(void); static intCreateRestrictedProcess(char *cmd, PROCESS_INFORMATION * processInfo); + + static SERVICE_STATUS status; + static SERVICE_STATUS_HANDLE hStatus = (SERVICE_STATUS_HANDLE) 0; + static HANDLE shutdownHandles[2]; + static pid_t postmasterPID = -1; + + #define shutdownEvent shutdownHandles[0] + #define postmasterProcess shutdownHandles[1] #endif + static pgpid_t get_pgpid(void); static char **readfile(const char *path); ! static int start_postmaster(void); ! static void read_post_opts(void); ! ! static bool test_postmaster_connection(bool); static bool postmaster_is_alive(pid_t pid); static char def_postopts_file[MAXPGPATH]; *** *** 391,405 ! /* Find the pgport and try a connection */ static bool ! test_postmaster_connection(void) { ! PGconn *conn; ! boolsuccess = false; ! int i; ! charportstr[32]; ! char *p; *portstr = '\0'; --- 402,421 ! /* ! * Find the pgport and try a connection ! * Note that the checkpoint parameter enables a Windows service control ! * manager checkpoint, it's got nothing to do with database checkpoints!! ! */ static bool ! test_postmaster_connection(bool do_checkpoint) { ! PGconn *conn; ! boolsuccess = false; ! int i; ! charportstr[32]; ! char*p; ! charconnstr[128]; /* Should be way more than enough! */ *portstr = '\0'; *** *** 464,473 if (!*portstr) snprintf(portstr, sizeof(portstr), %d, DEF_PGPORT); for (i = 0; i wait_seconds; i++) { ! if ((conn = PQsetdbLogin(NULL, portstr, NULL, NULL, !postgres, NULL, NULL)) != NULL (PQstatus(conn) == CONNECTION_OK || (strcmp(PQerrorMessage(conn), PQnoPasswordSupplied) == 0))) --- 480,491 if (!*portstr) snprintf(portstr, sizeof(portstr), %d, DEF_PGPORT); + /* We need to set a connect timeout otherwise on Windows the SCM will probably timeout first */ + snprintf(connstr, sizeof(connstr), dbname=postgres port=%s connect_timeout=5, portstr); + for (i = 0; i wait_seconds; i++) { ! if ((conn = PQconnectdb(connstr)) != NULL (PQstatus(conn) == CONNECTION_OK || (strcmp(PQerrorMessage(conn), PQnoPasswordSupplied) == 0))) *** *** 479,485 else { PQfinish(conn); ! print_msg(.); pg_usleep(100); /* 1 sec */ } } --- 497,521 else { PQfinish(conn); ! ! #if defined(WIN32) ! if (do_checkpoint) !
Re: [PATCHES] Load Distributed Checkpoints, final patch
Heikki Linnakangas [EMAIL PROTECTED] writes: We could just allow any value up to 1.0, and note in the docs that you should leave some headroom, unless you don't mind starting the next checkpoint a bit late. That actually sounds pretty good. What exactly happens if a checkpoint takes so long that the next checkpoint starts. Aside from it not actually helping is there much reason to avoid this situation? Have we ever actually tested it? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Load Distributed Checkpoints, final patch
Gregory Stark wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: We could just allow any value up to 1.0, and note in the docs that you should leave some headroom, unless you don't mind starting the next checkpoint a bit late. That actually sounds pretty good. What exactly happens if a checkpoint takes so long that the next checkpoint starts. Aside from it not actually helping is there much reason to avoid this situation? Not really. We might run out of preallocated WAL segments, and will have to create more. Recovery could be longer than expected since the real checkpoint interval ends up being longer, but you can't make very accurate recovery time estimations anyway. Have we ever actually tested it? I haven't. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Load Distributed Checkpoints, final patch
On Tue, 26 Jun 2007, Gregory Stark wrote: What exactly happens if a checkpoint takes so long that the next checkpoint starts. Aside from it not actually helping is there much reason to avoid this situation? Have we ever actually tested it? More segments get created, and because of how they are cleared at the beginning this causes its own mini-I/O storm through the same buffered write channel the checkpoint writes are going into (which way or may not be the same way normal WAL writes go, depending on whether you're using O_[D]SYNC WAL writes). I've seen some weird and intermittant breakdowns from the contention that occurs when this happens, and it's certainly something to be avoided. To test it you could just use a big buffer cache, write like mad to it, and make checkpoint_segments smaller than it should be for that workload. It's easy enough to kill yourself exactly this way right now though, and the fact that LDC gives you a parameter to aim this particular foot-gun more precisely isn't a big deal IMHO as long as the documentation is clear. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Load Distributed Checkpoints, final patch
Heikki Linnakangas [EMAIL PROTECTED] writes: We could just allow any value up to 1.0, and note in the docs that you should leave some headroom, unless you don't mind starting the next checkpoint a bit late. That actually sounds pretty good. Yeah, that sounds fine. There isn't actually any harm in starting a checkpoint later than otherwise expected, is there? The worst consequence I can think of is a backend having to take time to manufacture a new xlog segment, because we didn't finish a checkpoint in time to recycle old ones. This might be better handled by allowing a bit more slop in the number of recycled-into-the-future xlog segments. Come to think of it, shouldn't we be allowing some extra slop in the number of future segments to account for xlog archiving delays, when that's enabled? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] Load Distributed Checkpoints, final patch
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: We could just allow any value up to 1.0, and note in the docs that you should leave some headroom, unless you don't mind starting the next checkpoint a bit late. That actually sounds pretty good. Yeah, that sounds fine. There isn't actually any harm in starting a checkpoint later than otherwise expected, is there? The worst consequence I can think of is a backend having to take time to manufacture a new xlog segment, because we didn't finish a checkpoint in time to recycle old ones. This might be better handled by allowing a bit more slop in the number of recycled-into-the-future xlog segments. Come to think of it, shouldn't we be allowing some extra slop in the number of future segments to account for xlog archiving delays, when that's enabled? XLogFileSlop is currently 2 * checkpoint_segments + 1 since last checkpoint, which is just enough to accommodate a checkpoint that lasts the full checkpoint interval. If we want to keep as much slop there as before, then yes that should be increased to (2 + checkpoint_completion_target) * checkpoint_segments + 1, or just 3 * checkpoint_segments if we want to keep it simple. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] Load Distributed Checkpoints, take 3
On Tue, 26 Jun 2007, Tom Lane wrote: I'm not impressed with the idea of writing buffers because we might need them someday; that just costs extra I/O due to re-dirtying in too many scenarios. This is kind of an interesting statement to me because it really highlights the difference in how I thinking about this problem from how you see it. As far as I'm concerned, there's a hierarchy of I/O the database needs to finish that goes like this: 1) Client back-end writes (blocked until a buffer appears) 2) LRU writes so (1) doesn't happen 3) Checkpoint writes 4) Dirty pages with a non-zero usage count In my view of the world, there should be one parameter for a target rate of how much I/O you can stand under normal use, and the background writer should work its way as far down this chain as it can until it meets that. If there's plenty of clean buffers for the expected new allocations and there's no checkpoint going on, by all means write out some buffers we might re-dirty if there's I/O to spare. If you write them twice, so what? You didn't even get to that point as an option until all the important stuff was taken care of and the system was near idle. The elimination of the all-scan background writer means that true hot and dirty spots in the buffer cache, like popular index blocks on a heavily updated table that never get a zero usage_count, are never going to be written out other than as part of the checkpoint process. That's OK for now, but I'd like it to be the case that one day the database's I/O scheduling would eventually get to those, in order to optimize performance in the kind of bursty scenarios I've been mentioning lately. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PATCHES] psql: add volatility to \df+
Attached is a patch that adds information about function volatility to the output of psql's \df+ slash command. I'll apply this to HEAD tomorrow, barring any objections. -Neil Index: src/bin/psql/describe.c === RCS file: /home/neilc/postgres/cvs_root/pgsql/src/bin/psql/describe.c,v retrieving revision 1.155 diff -p -c -r1.155 describe.c *** src/bin/psql/describe.c 19 Mar 2007 23:38:31 - 1.155 --- src/bin/psql/describe.c 26 Jun 2007 22:01:55 - *** describeFunctions(const char *pattern, b *** 205,215 if (verbose) appendPQExpBuffer(buf, ,\n r.rolname as \%s\,\n l.lanname as \%s\,\n p.prosrc as \%s\,\n pg_catalog.obj_description(p.oid, 'pg_proc') as \%s\, ! _(Owner), _(Language), _(Source code), _(Description)); if (!verbose) --- 205,220 if (verbose) appendPQExpBuffer(buf, + ,\n CASE\n + WHEN p.provolatile = 'i' THEN 'immutable'\n + WHEN p.provolatile = 's' THEN 'stable'\n + WHEN p.provolatile = 'v' THEN 'volatile'\n + END as \%s\ ,\n r.rolname as \%s\,\n l.lanname as \%s\,\n p.prosrc as \%s\,\n pg_catalog.obj_description(p.oid, 'pg_proc') as \%s\, ! _(Volatility), _(Owner), _(Language), _(Source code), _(Description)); if (!verbose) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Load Distributed Checkpoints, final patch
On Tue, 26 Jun 2007, Heikki Linnakangas wrote: I'm scheduling more DBT-2 tests at a high # of warehouses per Greg Smith's suggestion just to see what happens, but I doubt that will change my mind on the above decisions. I don't either, at worst I'd expect a small documentation update perhaps with some warnings based on what's discovered there. The form you've added checkpoint_completion_target in is sufficient to address all the serious concerns I had; I can turn it off, I can smooth just a bit without increasing recovery time too much, or I can go all out smooth. Certainly no one should consider waiting for the tests I asked you about a hurdle to getting this patch committed, slowing that down was never my intention by bringing that up. I'm just curious to see if anything scurries out of some the darker corners in this area when they're illuminated. I'd actually like to see this get committed relatively soon because there's two interleaved merges stuck behind this one (the more verbose logging patch and the LRU modifications). -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] Load Distributed Checkpoints, take 3
Greg Smith [EMAIL PROTECTED] writes: If you write them twice, so what? You didn't even get to that point as an option until all the important stuff was taken care of and the system was near idle. Well even if it's near idle you were still occupying the i/o system for a few milliseconds. If someone else came in with a block request at that time you extended their response time by that much. The elimination of the all-scan background writer means that true hot and dirty spots in the buffer cache, like popular index blocks on a heavily updated table that never get a zero usage_count, are never going to be written out other than as part of the checkpoint process. If they're really popular blocks on a heavily updated table then they really don't buy us anything to write them out unnecessarily. The case where they help us is when they weren't really popular but we're not doing enough to get around to writing them out and then when we do need to write it out the system's busy. In that case we wasted a chance to write them out when the system was more idle. But don't forget that we're still going through the OS's buffers. That will smooth out a lot of the i/o we generate anyways. Just because Postgres is idle doesn't mean the OS isn't busy flushing the buffers we wrote out when it was busy. That's OK for now, but I'd like it to be the case that one day the database's I/O scheduling would eventually get to those, in order to optimize performance in the kind of bursty scenarios I've been mentioning lately. I think the feeling is that the bursty scenarios are really corner cases. Heikki described a case where you're dirtying tons of blocks without triggering any WAL. That can happen but it's pretty peculiar. I can imagine a scenario where you have a system that's very busy for 60s and then idle for 60s repeatedly. And for some reason you configure a checkpoint_timeout on the order of 20m or so (assuming you're travelling precisely 60mph). In that scenario bgwriter's lru scan has to fight to keep up for 60s while it's mostly writing out dirty pages that it could have flushed out during the idle time. Effectively you're only making use of half the i/o bandwidth since bgwriter doesn't do any work for half the duty cycle. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Load Distributed Checkpoints, take 3
Gregory Stark wrote: I can imagine a scenario where you have a system that's very busy for 60s and then idle for 60s repeatedly. And for some reason you configure a checkpoint_timeout on the order of 20m or so (assuming you're travelling precisely 60mph). Is that Scottish m? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] psql: add volatility to \df+
Neil Conway [EMAIL PROTECTED] writes: Attached is a patch that adds information about function volatility to the output of psql's \df+ slash command. I'll apply this to HEAD tomorrow, barring any objections. +1, but are there not any documentation changes to make? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] psql: add volatility to \df+
Tom Lane wrote: Neil Conway [EMAIL PROTECTED] writes: Attached is a patch that adds information about function volatility to the output of psql's \df+ slash command. I'll apply this to HEAD tomorrow, barring any objections. +1, but are there not any documentation changes to make? Well here is a question (just because I haven't seen it) is there a list of functions and their volatility in the docs? Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] psql: add volatility to \df+
On Tue, 2007-26-06 at 20:29 -0700, Joshua D. Drake wrote: Tom Lane wrote: +1, but are there not any documentation changes to make? Sure, I'll update the psql ref page. Well here is a question (just because I haven't seen it) is there a list of functions and their volatility in the docs? Not relevant to this thread, but AFAICS there is no explicit list of the volatilities of the builtin functions. -Neil ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Cancel autovacuum conflicting with DROP TABLE
ITAGAKI Takahiro wrote: ITAGAKI Takahiro [EMAIL PROTECTED] wrote: Here is a patch that cancels autovacuum workers conflicting with DROP TABLE, TRUNCATE and CLUSTER. It was discussed here: http://archives.postgresql.org/pgsql-hackers/2007-06/msg00556.php I made an adjustment for the latest 'more autovacuum fixes' patch. http://archives.postgresql.org/pgsql-patches/2007-06/msg00217.php Now, autovacuum workers handles ProcDie signals using ERROR instead of FATAL. The exception is caught by the worker and converted to the following logs. SIGINT -- Cancel the current job. LOG: autovacuum on db.schema.table is canceled SIGTERM -- Cancel all jobs. LOG: autovacuum on db is canceled Thanks for the patch. I think there are actually three patches in here: 1. changing SIGINT so that it cancels the current table instead of shutting down the entire worker. 2. changing DROP TABLE and TRUNCATE so that they cancel an autovac worker by sending SIGINT. 3. change the interrupt code so that autovac workers are terminated with ERROR instead of FATAL, knowing that the final outcome is the same. If I'm not mistaken the only point of the change is to be able to change the error message, is that correct? I don't feel very much comfortable with the patch (3). I would prefer that we keep errcode FATAL and select a different error message in ProcessInterrupts instead. I don't see the point in complicating the sigjmp target without need. I agree with the (2) change. The change in (1) I don't feel comfortable with commenting. It seems strange to me, and although it seems like it would be safe, I cannot help having a strange feeling about it. I'll try to digest it a bit more. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] psql: add volatility to \df+
Tom Lane wrote: Neil Conway [EMAIL PROTECTED] writes: Attached is a patch that adds information about function volatility to the output of psql's \df+ slash command. I'll apply this to HEAD tomorrow, barring any objections. +1, but are there not any documentation changes to make? +1 as well but I'm wondering whether the output is too wide. Is it possible to find some shorter string to convey the same meaning? Ideally, without being a single character. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 Saca el libro que tu religión considere como el indicado para encontrar la oración que traiga paz a tu alma. Luego rebootea el computador y ve si funciona (Carlos Duclós) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match