Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Heikki Linnakangas

Tom Lane wrote:

Hmm.  But if we're going to do that, we might as well have a checkpoint
for our troubles, no?  The reason for the current design is the
assumption that a bgwriter_all scan is less burdensome than a
checkpoint, but that is no longer true given this rewrite.


Per comments in CreateCheckPoint, another  also skip the extra 
checkpoints to avoid writing two checkpoints to the same page, risking 
losing both on a crash:



 * If this isn't a shutdown or forced checkpoint, and we have not 
inserted
 * any XLOG records since the start of the last checkpoint, skip the
 * checkpoint.  The idea here is to avoid inserting duplicate 
checkpoints
 * when the system is idle. That wastes log space, and more importantly 
it
 * exposes us to possible loss of both current and previous checkpoint
 * records if the machine crashes just as we're writing the update.
 * (Perhaps it'd make even more sense to checkpoint only when the 
previous
 * checkpoint record is in a different xlog page?)


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PATCHES] pgstat_drop_relation bugfix

2007-06-26 Thread ITAGAKI Takahiro
I wrote:
  pgstat_drop_relation() is expecting relid (pg_class.oid) as the argument,
  but we pass it relfilenode.
 I'm trying to fix the bug, because there is a possibility that some useless
 statistics data continue to occupy some parts of the statistics table.

Here is a patch to fix undropped entries in the runtime statistics table.
Now smgr records the relation oids and uses them to drop entries
corresponding the relations.

We could make stray entries easily using the following queries.
  CREATE TABLE test (i integer);
  INSERT INTO test VALUES(1);
  TRUNCATE test;
  DROP TABLE test;

Since we don't discard any of stat entries except pg_stat_reset(),
those useless entries would cause memory leaks, though it is very trivial.

I fell my fix is uncool; Let me know if there are any other better ways.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



pgstat_drop_relation.patch
Description: Binary data

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] msvc and vista fun

2007-06-26 Thread Magnus Hagander
On Mon, Jun 25, 2007 at 03:19:32PM -0400, Andrew Dunstan wrote:
 
 
 I wrote:
 
 
 Would making a change like this in those 12 places be so ugly?
 
 
 Specifically, I propose the following patch, which should fix the issues 
 buildfarm apparently has with the XP command shell (or some incarnations 
 of it).

Yeah. Dave has verified taht it's actually because of the XP stuff (he can
repro the failure on XP), so I guess we don't have much choice.


 Strictly speaking, we don't need the changes in builddoc.bat and 
 install.bat, but I did them for the sake of completeness. Since it 
 should not affect anyone not setting XP_EXIT_FIX it seems fairly low risk.

I've applied it, except for builddoc.bat. Since we didn't send out an
exitcode anyway there, it makes no difference. Left them in install.bat
because there's an actual exitcode there.

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Heikki Linnakangas

Tom Lane wrote:

Greg Smith [EMAIL PROTECTED] writes:
The way transitions between completely idle and all-out bursts happen were 
one problematic area I struggled with.  Since the LRU point doesn't move 
during the idle parts, and the lingering buffers have a usage_count0, the 
LRU scan won't touch them; the only way to clear out a bunch of dirty 
buffers leftover from the last burst is with the all-scan.


One thing that might be worth changing is that right now, BgBufferSync
starts over from the current clock-sweep point on each call --- that is,
each bgwriter cycle.  So it can't really be made to write very many
buffers without excessive CPU work.  Maybe we should redefine it to have
some static state carried across bgwriter cycles, such that it would
write at most N dirty buffers per call, but scan through X percent of
the buffers, possibly across several calls, before returning to the (by
now probably advanced) clock-sweep point.  This would allow a larger
value of X to be used than is currently practical.  You might wish to
recheck the clock sweep point on each iteration just to make sure the
scan hasn't fallen behind it, but otherwise I don't see any downside.
The scenario where somebody re-dirties a buffer that was cleaned by the
bgwriter scan isn't a problem, because that buffer will also have had its
usage_count increased and thereby not be a candidate for replacement.


Something along those lines could be useful. I've thought of that 
before, but it never occured to me that if a page in front of the clock 
hand is re-dirtied, it's no longer a candidate for replacement anyway...


I'm going to leave the all- and lru- bgwriter scans alone for now to get 
this LDC patch finished. We still have the bgwriter autotuning patch in 
the queue. Let's think about this in the context of that patch.


As a general comment on this subject, a lot of the work in LDC presumes 
you have an accurate notion of how close the next checkpoint is.


Yeah; this is one reason I was interested in carrying some write-speed
state across checkpoints instead of having the calculation start from
scratch each time.  That wouldn't help systems that sit idle a long time
and suddenly go nuts, but it seems to me that smoothing the write rate
across more than one checkpoint could help if the fluctuations occur
over a timescale of a few checkpoints.


Hmm. This problem only applies to checkpoints triggered by 
checkpoint_segments; time tends to move forward at a constant rate.


I'm not worried about small fluctuations or bursts. As I argued earlier, 
the OS still buffers the writes and should give some extra smoothing of 
the physical writes. I believe bursts of say 50-100 MB would easily fit 
in OS cache, as long as there's enough gap between them. I haven't 
tested that, though.


Here's a proposal for an algorithm to smooth bigger bursts:

The basic design is the same as before. We keep track of elapsed time 
and elapsed xlogs, and based on them we estimate how much progress in 
flushing the buffers we should've made by now, and then we catch up 
until we reach that. The estimate for time is the same. The estimate for 
xlogs gets more complicated:


Let's have a few definitions first:

Ro = elapsed segments / elapsed time, from previous checkpoint cycle. 
For example, 1.25 means that the checkpoint was triggered by 
checkpoint_segments, and we had spent 1/1.25 =  80% of 
checkpoint_timeout when we reached checkpoint_segments. 0.25 would mean 
that checkpoint was triggered by checkpoint_timeout, and we had spent 
25% of checkpoint_segments by then.


Rn = elapsed segments / elapsed time this far from current in-progress 
checkpoint.


t = elapsed time, as a fraction of checkpoint_timeout (0.0 - 1.0, though 
could be  1 if next checkpoint is already due)
s = elapsed xlog segments, as a fraction of checkpoint_segments (0.0 - 
1.0, though could again be  1 if next checkpoint is already due)


R = estimate for WAL segment consumption rate, as checkpoint_segments / 
checkpoint_timeout


R = Ro * t + Rn * (1 - t)

In other words, at the beginning of the checkpoint, we give more weight 
to the state carried over from previous checkpoint. As we go forward, 
more weight is given to the rate calculated from current cycle.


From R, we extrapolate how much progress we should've done by now:

Target progress = R * t

This would require saving just one number from previous cycle (Rn), and 
there is no requirement to call the estimation function at steady time 
intervals, for example. It gives pretty steady I/O rate even if there's 
big bursts in WAL activity, but still reacts to changes in the rate.


I'm not convinced this is worth the effort, though. First of all, this 
is only a problem if you use checkpoint_segments to control your 
checkpoints, so you can lower checkpoint_timeout to do more work during 
the idle periods. Secondly, with the optimization of not flushing 
buffers during checkpoint that were dirtied after the start of 

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Hmm.  But if we're going to do that, we might as well have a checkpoint
 for our troubles, no?  The reason for the current design is the
 assumption that a bgwriter_all scan is less burdensome than a
 checkpoint, but that is no longer true given this rewrite.

 Per comments in CreateCheckPoint, another  also skip the extra 
 checkpoints to avoid writing two checkpoints to the same page, risking 
 losing both on a crash:

That's not the reason for the current design --- that logic existed long
before bgwriter did.

Anyway, if there are no XLOG records since the last checkpoint, there's
probably nothing in shared buffers that needs flushing.  There might be
some dirty hint-bits, but the only reason to push those out is to make
some free buffers available, and doing that is not checkpoint's job (nor
the all-buffers scan's job); that's what the LRU scan is for.

The only reason that the all-buffers scan was put in was to try to make
the next checkpoint cheaper.  That reason falls to the ground with LDC.
What we have left is merely excess I/O.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Heikki Linnakangas

Tom Lane wrote:

Anyway, if there are no XLOG records since the last checkpoint, there's
probably nothing in shared buffers that needs flushing.  There might be
some dirty hint-bits, but the only reason to push those out is to make
some free buffers available, and doing that is not checkpoint's job (nor
the all-buffers scan's job); that's what the LRU scan is for.


Yeah, except the LRU scan is not doing a very good job at that. It will 
ignore buffers with usage_count  0, and it only scans 
bgwriter_lru_percent buffers ahead of the clock hand.


One pathological case is a COPY of a table slightly smaller than 
shared_buffers. That will fill the buffer cache. If you then have a 
checkpoint, and after that a SELECT COUNT(*), or a VACUUM, the buffer 
cache will be full of pages with just hint-bit-updates, but no WAL 
activity since last checkpoint.


But let's fix the LRU scan, rather work around it's deficiencies.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 ... that's what the LRU scan is for.

 Yeah, except the LRU scan is not doing a very good job at that. It will 
 ignore buffers with usage_count  0, and it only scans 
 bgwriter_lru_percent buffers ahead of the clock hand.

Which is exactly in sync with which buffers are actually candidates for
replacement.  It could be looking further ahead of the clock hand, as
per my previous suggestion, but there is no need to push out buffers
with usage_count  0 until after the sweep has decremented that to 0.
Doing so would tend to create excess I/O --- it makes it more likely
that multiple page-dirtying events on a page will result in separate
writes instead of a single write.

 One pathological case is a COPY of a table slightly smaller than 
 shared_buffers. That will fill the buffer cache. If you then have a 
 checkpoint, and after that a SELECT COUNT(*), or a VACUUM, the buffer 
 cache will be full of pages with just hint-bit-updates, but no WAL 
 activity since last checkpoint.

This argument supposes that the bgwriter will do nothing while the COPY
is proceeding.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
One pathological case is a COPY of a table slightly smaller than 
shared_buffers. That will fill the buffer cache. If you then have a 
checkpoint, and after that a SELECT COUNT(*), or a VACUUM, the buffer 
cache will be full of pages with just hint-bit-updates, but no WAL 
activity since last checkpoint.


This argument supposes that the bgwriter will do nothing while the COPY
is proceeding.


It will clean buffers ahead of the COPY, but it won't write the buffers 
COPY leaves behind since they have usage_count=1.


Let me demonstrate this with an imaginary example with shared_buffers=4:

buf_id  usage_count dirty
1   0   f
2   0   f
3   0   f
4   0   f

After COPY

buf_id  usage_count dirty
1   1   t
2   1   t
3   1   t
4   1   t

CHECKPOINT:

buf_id  usage_count dirty
1   1   f
2   1   f
3   1   f
4   1   f

VACUUM:

buf_id  usage_count dirty
1   1   t
2   1   t
3   1   t
4   1   t

As soon as a backend asks for a buffer, the situation is defused as the 
backend will do a full clock sweep and decrement the usage_count of each 
buffer to 0, letting the bgwriter lru-scan to clean them.


Having the buffer cache full of dirty buffers is not a problem on its 
own, so this only becomes a performance issue if you then issue another 
large COPY etc. that needs those buffers, and you now have to write them 
at the busy time.


This is a corner case that might not be worth worrying about. It's also 
mitigated by the fact that the OS cache is most likely clean after a 
period of idle time, and should be able to absorb the write burst.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 This argument supposes that the bgwriter will do nothing while the COPY
 is proceeding.

 It will clean buffers ahead of the COPY, but it won't write the buffers 
 COPY leaves behind since they have usage_count=1.

Yeah, and they don't *need* to be written until the clock sweep has
passed over them once.  I'm not impressed with the idea of writing
buffers because we might need them someday; that just costs extra
I/O due to re-dirtying in too many scenarios.

(Note that COPY per se will not trigger this behavior anyway, since it
will act in a limited number of buffers because of the recent buffer
access strategy patch.)

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Heikki Linnakangas

Tom Lane wrote:

(Note that COPY per se will not trigger this behavior anyway, since it
will act in a limited number of buffers because of the recent buffer
access strategy patch.)


Actually we dropped it from COPY, because it didn't seem to improve 
performance in the tests we ran.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 (Note that COPY per se will not trigger this behavior anyway, since it
 will act in a limited number of buffers because of the recent buffer
 access strategy patch.)

 Actually we dropped it from COPY, because it didn't seem to improve 
 performance in the tests we ran.

Who's we?  AFAICS, CVS HEAD will treat a large copy the same as any
other large heapscan.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Tom Lane wrote:

(Note that COPY per se will not trigger this behavior anyway, since it
will act in a limited number of buffers because of the recent buffer
access strategy patch.)


Actually we dropped it from COPY, because it didn't seem to improve 
performance in the tests we ran.


Who's we?  AFAICS, CVS HEAD will treat a large copy the same as any
other large heapscan.


Umm, I'm talking about populating a table with COPY *FROM*. That's not a 
heap scan at all.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Who's we?  AFAICS, CVS HEAD will treat a large copy the same as any
 other large heapscan.

 Umm, I'm talking about populating a table with COPY *FROM*. That's not a 
 heap scan at all.

No wonder we're failing to communicate.  I assumed you were talking
about copy-to-file.  Copy-from-file is going to be creating WAL entries
hence the no-checkpoint case doesn't apply anyway, no?

[ thinks ... ]  Oh, you must be positing the case where the recently
added skip-WAL-if-table-is-new-in-this-transaction optimization applies.
Well, that thing could probably do with some more work anyway (I wonder
why it's using shared buffers at all anymore).  I don't really think
that case should be allowed to drive our thinking about how the bgwriter
should work.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Tom Lane wrote:

Who's we?  AFAICS, CVS HEAD will treat a large copy the same as any
other large heapscan.


Umm, I'm talking about populating a table with COPY *FROM*. That's not a 
heap scan at all.


No wonder we're failing to communicate.  I assumed you were talking
about copy-to-file.  Copy-from-file is going to be creating WAL entries
hence the no-checkpoint case doesn't apply anyway, no?


We are indeed having trouble to communicate :(.

No, I'm not talking about the new non-WAL-logged COPY optimization. COPY 
FROM *would* create WAL entries, and the next checkpoint would clean 
them. So far so good. But then you run VACUUM, as you often do after 
loading a table, to set all hint bits. That will *not* generate WAL, and 
next checkpoint is skipped.


To recap, the sequence is:

1. COPY FROM
2. checkpoint
3. VACUUM

Now you have buffer cache full of dirty buffers with usage_count=1, and 
no WAL activity since last checkpoint. They will not be flushed until:

a) WAL activity happens and next checkpoint comes
b) database is shut down, or manual CHECKPOINT is issued
c) clock hand advances and decrements the usage_counts

It's a corner case. Probably not a problem in practice; you usually run 
CREATE INDEX after loading a table, for example. But it exists.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Greg Smith

On Mon, 25 Jun 2007, Tom Lane wrote:

right now, BgBufferSync starts over from the current clock-sweep point 
on each call --- that is, each bgwriter cycle.  So it can't really be 
made to write very many buffers without excessive CPU work.  Maybe we 
should redefine it to have some static state carried across bgwriter 
cycles


The LRU portion restarts like that, and it's certainly not optimal.  But 
the auto-tuning LRU patch that's already near application makes this much 
less of an issue because it only does real work when buffers have been 
allocated, so the sweep point will have moved along.  I'll add this idea 
to my list of things that would be nice to have as part of a larger 
rewriter, I think it's more trouble than it's worth to chase right now.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 To recap, the sequence is:

 1. COPY FROM
 2. checkpoint
 3. VACUUM

 Now you have buffer cache full of dirty buffers with usage_count=1,

Well, it won't be very full, because VACUUM works in a limited number of
buffers (and did even before the BufferAccessStrategy patch).

I have no doubt that there are scenarios such as you are thinking about,
but it definitely seems like a corner case that doesn't justify keeping
the all-buffers scan.  That scan is costing us extra I/O in ordinary
non-corner cases, so it's not free to keep it.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Greg Smith

On Tue, 26 Jun 2007, Tom Lane wrote:


I have no doubt that there are scenarios such as you are thinking about,
but it definitely seems like a corner case that doesn't justify keeping
the all-buffers scan.  That scan is costing us extra I/O in ordinary
non-corner cases, so it's not free to keep it.


And scenarios I'm concerned about but can't diagram as easily fall into 
this category as well.  I agree that a LDC enabled config would ship with 
the all-buffers scan turned off as redundant and wasteful, in which the 
only cost to keep it is code baggage.  But the fact that there are corner 
cases floating around this area is what makes me feel that removing it 
altogether is still a bit premature.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Michael Glaesemann


On Jun 26, 2007, at 13:49 , Heikki Linnakangas wrote:

Maximum is 0.9, to leave some headroom for fsync and any other  
things that need to happen during a checkpoint.


I think it might be more user-friendly to make the maximum 1 (meaning  
as much smoothing as you could possibly get) and internally reserve a  
certain amount off for whatever headroom might be required. It's more  
common for users to see a value range from 0 to 1 rather than 0 to 0.9.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Barring any objections from committer, I'm finished with this patch.

Sounds great, I'll start looking this over.

 I'm scheduling more DBT-2 tests at a high # of warehouses per Greg 
 Smith's suggestion just to see what happens, but I doubt that will 
 change my mind on the above decisions.

When do you expect to have those results?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Barring any objections from committer, I'm finished with this patch.


Sounds great, I'll start looking this over.

I'm scheduling more DBT-2 tests at a high # of warehouses per Greg 
Smith's suggestion just to see what happens, but I doubt that will 
change my mind on the above decisions.


When do you expect to have those results?


In a few days. I'm doing long tests because the variability in the 1h 
tests was very high.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Heikki Linnakangas

Michael Glaesemann wrote:


On Jun 26, 2007, at 13:49 , Heikki Linnakangas wrote:

Maximum is 0.9, to leave some headroom for fsync and any other things 
that need to happen during a checkpoint.


I think it might be more user-friendly to make the maximum 1 (meaning as 
much smoothing as you could possibly get) and internally reserve a 
certain amount off for whatever headroom might be required. It's more 
common for users to see a value range from 0 to 1 rather than 0 to 0.9.


It would then be counter-intuitive if you set it to 1.0, and see that 
your checkpoints consistently take 90% of the checkpoint interval.


We could just allow any value up to 1.0, and note in the docs that you 
should leave some headroom, unless you don't mind starting the next 
checkpoint a bit late. That actually sounds pretty good.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PATCHES] pg_ctl -w (wait) option on Windows

2007-06-26 Thread Dave Page

The attached patch implements the following fixes to pg_ctl on Windows:

- Fix the -w (wait) option to work in Windows service mode, per bug 
#3382. This is required on Windows because pg_ctl reports running status 
to the service control manager when actually still in recovery/startup, 
causing any dependent services to start too early.


- Prevent the -w option being passed to the postmaster.

- Read the postmaster options file when starting as a Windows service.

As a side note, I noticed whilst testing this that the -w option relies 
on libpq to provide the username with which to connect to the server to 
test the connection. This will fail if that user is not the one that 
initialized the cluster, or if the superuser name has been changed. I 
hit the former case because I initdb'ed in my dev environment, and then 
setup the server to run as a service under a dedicated user account. I 
realise this won't affect normal users, and falls neatly into the 'don't 
do that then' category, but I thought I'd mention it :-)


Regards, Dave
Index: pg_ctl.c
===
RCS file: /projects/cvsroot/pgsql/src/bin/pg_ctl/pg_ctl.c,v
retrieving revision 1.80
diff -c -r1.80 pg_ctl.c
*** pg_ctl.c31 May 2007 15:13:04 -  1.80
--- pg_ctl.c26 Jun 2007 19:24:06 -
***
*** 126,136 
  static void WINAPI pgwin32_ServiceMain(DWORD, LPTSTR *);
  static void pgwin32_doRunAsService(void);
  static intCreateRestrictedProcess(char *cmd, PROCESS_INFORMATION * 
processInfo);
  #endif
  static pgpid_t get_pgpid(void);
  static char **readfile(const char *path);
! static intstart_postmaster(void);
! static bool test_postmaster_connection(void);
  static bool postmaster_is_alive(pid_t pid);

  static char def_postopts_file[MAXPGPATH];
--- 126,147 
  static void WINAPI pgwin32_ServiceMain(DWORD, LPTSTR *);
  static void pgwin32_doRunAsService(void);
  static intCreateRestrictedProcess(char *cmd, PROCESS_INFORMATION * 
processInfo);
+
+ static SERVICE_STATUS status;
+ static SERVICE_STATUS_HANDLE hStatus = (SERVICE_STATUS_HANDLE) 0;
+ static HANDLE shutdownHandles[2];
+ static pid_t postmasterPID = -1;
+
+ #define shutdownEvent   shutdownHandles[0]
+ #define postmasterProcess shutdownHandles[1]
  #endif
+
  static pgpid_t get_pgpid(void);
  static char **readfile(const char *path);
! static int start_postmaster(void);
! static void read_post_opts(void);
!
! static bool test_postmaster_connection(bool);
  static bool postmaster_is_alive(pid_t pid);

  static char def_postopts_file[MAXPGPATH];
***
*** 391,405 



! /* Find the pgport and try a connection */
  static bool
! test_postmaster_connection(void)
  {
!   PGconn *conn;
!   boolsuccess = false;
!   int i;
!   charportstr[32];
!   char   *p;

*portstr = '\0';

--- 402,421 



! /*
!  * Find the pgport and try a connection
!  * Note that the checkpoint parameter enables a Windows service control
!  * manager checkpoint, it's got nothing to do with database checkpoints!!
!  */
  static bool
! test_postmaster_connection(bool do_checkpoint)
  {
!   PGconn  *conn;
!   boolsuccess = false;
!   int i;
!   charportstr[32];
!   char*p;
!   charconnstr[128]; /* Should be way more than enough! */

*portstr = '\0';

***
*** 464,473 
if (!*portstr)
snprintf(portstr, sizeof(portstr), %d, DEF_PGPORT);

for (i = 0; i  wait_seconds; i++)
{
!   if ((conn = PQsetdbLogin(NULL, portstr, NULL, NULL,
!postgres, 
NULL, NULL)) != NULL 
(PQstatus(conn) == CONNECTION_OK ||
 (strcmp(PQerrorMessage(conn),
 PQnoPasswordSupplied) == 0)))
--- 480,491 
if (!*portstr)
snprintf(portstr, sizeof(portstr), %d, DEF_PGPORT);

+   /* We need to set a connect timeout otherwise on Windows the SCM will 
probably timeout first */
+   snprintf(connstr, sizeof(connstr), dbname=postgres port=%s 
connect_timeout=5, portstr);
+
for (i = 0; i  wait_seconds; i++)
{
!   if ((conn = PQconnectdb(connstr)) != NULL 
(PQstatus(conn) == CONNECTION_OK ||
 (strcmp(PQerrorMessage(conn),
 PQnoPasswordSupplied) == 0)))
***
*** 479,485 
else
{
PQfinish(conn);
!   print_msg(.);
pg_usleep(100); /* 1 sec */
}
}
--- 497,521 
else
{
PQfinish(conn);
!
! #if defined(WIN32)
!   if (do_checkpoint)
!   

Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Gregory Stark

Heikki Linnakangas [EMAIL PROTECTED] writes:

 We could just allow any value up to 1.0, and note in the docs that you should
 leave some headroom, unless you don't mind starting the next checkpoint a bit
 late. That actually sounds pretty good.

What exactly happens if a checkpoint takes so long that the next checkpoint
starts. Aside from it not actually helping is there much reason to avoid this
situation? Have we ever actually tested it?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Heikki Linnakangas

Gregory Stark wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:


We could just allow any value up to 1.0, and note in the docs that you should
leave some headroom, unless you don't mind starting the next checkpoint a bit
late. That actually sounds pretty good.


What exactly happens if a checkpoint takes so long that the next checkpoint
starts. Aside from it not actually helping is there much reason to avoid this
situation? 


Not really. We might run out of preallocated WAL segments, and will have 
to create more. Recovery could be longer than expected since the real 
checkpoint interval ends up being longer, but you can't make very 
accurate recovery time estimations anyway.



Have we ever actually tested it?


I haven't.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Greg Smith

On Tue, 26 Jun 2007, Gregory Stark wrote:


What exactly happens if a checkpoint takes so long that the next checkpoint
starts. Aside from it not actually helping is there much reason to avoid this
situation? Have we ever actually tested it?


More segments get created, and because of how they are cleared at the 
beginning this causes its own mini-I/O storm through the same buffered 
write channel the checkpoint writes are going into (which way or may not 
be the same way normal WAL writes go, depending on whether you're using 
O_[D]SYNC WAL writes).  I've seen some weird and intermittant breakdowns 
from the contention that occurs when this happens, and it's certainly 
something to be avoided.


To test it you could just use a big buffer cache, write like mad to it, 
and make checkpoint_segments smaller than it should be for that workload. 
It's easy enough to kill yourself exactly this way right now though, and 
the fact that LDC gives you a parameter to aim this particular foot-gun 
more precisely isn't a big deal IMHO as long as the documentation is 
clear.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 We could just allow any value up to 1.0, and note in the docs that you 
 should leave some headroom, unless you don't mind starting the next 
 checkpoint a bit late. That actually sounds pretty good.

Yeah, that sounds fine.  There isn't actually any harm in starting a
checkpoint later than otherwise expected, is there?  The worst
consequence I can think of is a backend having to take time to
manufacture a new xlog segment, because we didn't finish a checkpoint
in time to recycle old ones.  This might be better handled by allowing
a bit more slop in the number of recycled-into-the-future xlog segments.

Come to think of it, shouldn't we be allowing some extra slop in the
number of future segments to account for xlog archiving delays, when
that's enabled?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
We could just allow any value up to 1.0, and note in the docs that you 
should leave some headroom, unless you don't mind starting the next 
checkpoint a bit late. That actually sounds pretty good.


Yeah, that sounds fine.  There isn't actually any harm in starting a
checkpoint later than otherwise expected, is there?  The worst
consequence I can think of is a backend having to take time to
manufacture a new xlog segment, because we didn't finish a checkpoint
in time to recycle old ones.  This might be better handled by allowing
a bit more slop in the number of recycled-into-the-future xlog segments.

Come to think of it, shouldn't we be allowing some extra slop in the
number of future segments to account for xlog archiving delays, when
that's enabled?


XLogFileSlop is currently 2 * checkpoint_segments + 1 since last 
checkpoint, which is just enough to accommodate a checkpoint that lasts 
the full checkpoint interval. If we want to keep as much slop there as 
before, then yes that should be increased to (2 + 
checkpoint_completion_target) * checkpoint_segments + 1, or just 3 * 
checkpoint_segments if we want to keep it simple.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Greg Smith

On Tue, 26 Jun 2007, Tom Lane wrote:

I'm not impressed with the idea of writing buffers because we might need 
them someday; that just costs extra I/O due to re-dirtying in too many 
scenarios.


This is kind of an interesting statement to me because it really 
highlights the difference in how I thinking about this problem from how 
you see it.  As far as I'm concerned, there's a hierarchy of I/O the 
database needs to finish that goes like this:


1) Client back-end writes (blocked until a buffer appears)
2) LRU writes so (1) doesn't happen
3) Checkpoint writes
4) Dirty pages with a non-zero usage count

In my view of the world, there should be one parameter for a target rate 
of how much I/O you can stand under normal use, and the background writer 
should work its way as far down this chain as it can until it meets that. 
If there's plenty of clean buffers for the expected new allocations and 
there's no checkpoint going on, by all means write out some buffers we 
might re-dirty if there's I/O to spare.  If you write them twice, so what? 
You didn't even get to that point as an option until all the important 
stuff was taken care of and the system was near idle.


The elimination of the all-scan background writer means that true hot and 
dirty spots in the buffer cache, like popular index blocks on a heavily 
updated table that never get a zero usage_count, are never going to be 
written out other than as part of the checkpoint process.  That's OK for 
now, but I'd like it to be the case that one day the database's I/O 
scheduling would eventually get to those, in order to optimize performance 
in the kind of bursty scenarios I've been mentioning lately.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PATCHES] psql: add volatility to \df+

2007-06-26 Thread Neil Conway
Attached is a patch that adds information about function volatility to
the output of psql's \df+ slash command. I'll apply this to HEAD
tomorrow, barring any objections.

-Neil

Index: src/bin/psql/describe.c
===
RCS file: /home/neilc/postgres/cvs_root/pgsql/src/bin/psql/describe.c,v
retrieving revision 1.155
diff -p -c -r1.155 describe.c
*** src/bin/psql/describe.c	19 Mar 2007 23:38:31 -	1.155
--- src/bin/psql/describe.c	26 Jun 2007 22:01:55 -
*** describeFunctions(const char *pattern, b
*** 205,215 
  
  	if (verbose)
  		appendPQExpBuffer(buf,
  		  ,\n  r.rolname as \%s\,\n
  		l.lanname as \%s\,\n
  		p.prosrc as \%s\,\n
  pg_catalog.obj_description(p.oid, 'pg_proc') as \%s\,
! 		  _(Owner), _(Language),
  		  _(Source code), _(Description));
  
  	if (!verbose)
--- 205,220 
  
  	if (verbose)
  		appendPQExpBuffer(buf,
+ 		  ,\n CASE\n
+ 		WHEN p.provolatile = 'i' THEN 'immutable'\n
+ 		WHEN p.provolatile = 's' THEN 'stable'\n
+ 		WHEN p.provolatile = 'v' THEN 'volatile'\n
+ 		  END as \%s\
  		  ,\n  r.rolname as \%s\,\n
  		l.lanname as \%s\,\n
  		p.prosrc as \%s\,\n
  pg_catalog.obj_description(p.oid, 'pg_proc') as \%s\,
! 		  _(Volatility), _(Owner), _(Language),
  		  _(Source code), _(Description));
  
  	if (!verbose)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Greg Smith

On Tue, 26 Jun 2007, Heikki Linnakangas wrote:

I'm scheduling more DBT-2 tests at a high # of warehouses per Greg Smith's 
suggestion just to see what happens, but I doubt that will change my mind on 
the above decisions.


I don't either, at worst I'd expect a small documentation update perhaps 
with some warnings based on what's discovered there.  The form you've 
added checkpoint_completion_target in is sufficient to address all the 
serious concerns I had; I can turn it off, I can smooth just a bit without 
increasing recovery time too much, or I can go all out smooth.


Certainly no one should consider waiting for the tests I asked you about a 
hurdle to getting this patch committed, slowing that down was never my 
intention by bringing that up.  I'm just curious to see if anything 
scurries out of some the darker corners in this area when they're 
illuminated.  I'd actually like to see this get committed relatively soon 
because there's two interleaved merges stuck behind this one (the more 
verbose logging patch and the LRU modifications).


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Gregory Stark
Greg Smith [EMAIL PROTECTED] writes:

 If you write them twice, so what? You didn't even get to that point as an
 option until all the important stuff was taken care of and the system was
 near idle.

Well even if it's near idle you were still occupying the i/o system for a few
milliseconds. If someone else came in with a block request at that time you
extended their response time by that much. 

 The elimination of the all-scan background writer means that true hot and 
 dirty
 spots in the buffer cache, like popular index blocks on a heavily updated 
 table
 that never get a zero usage_count, are never going to be written out other 
 than
 as part of the checkpoint process.  

If they're really popular blocks on a heavily updated table then they really
don't buy us anything to write them out unnecessarily. 

The case where they help us is when they weren't really popular but we're not
doing enough to get around to writing them out and then when we do need to
write it out the system's busy. In that case we wasted a chance to write them
out when the system was more idle.

But don't forget that we're still going through the OS's buffers. That will
smooth out a lot of the i/o we generate anyways. Just because Postgres is idle
doesn't mean the OS isn't busy flushing the buffers we wrote out when it was
busy.

 That's OK for now, but I'd like it to be the case that one day the
 database's I/O scheduling would eventually get to those, in order to
 optimize performance in the kind of bursty scenarios I've been mentioning
 lately.

I think the feeling is that the bursty scenarios are really corner cases.
Heikki described a case where you're dirtying tons of blocks without
triggering any WAL. That can happen but it's pretty peculiar.

I can imagine a scenario where you have a system that's very busy for 60s and
then idle for 60s repeatedly. And for some reason you configure a
checkpoint_timeout on the order of 20m or so (assuming you're travelling
precisely 60mph). 

In that scenario bgwriter's lru scan has to fight to keep up for 60s while
it's mostly writing out dirty pages that it could have flushed out during the
idle time. Effectively you're only making use of half the i/o bandwidth since
bgwriter doesn't do any work for half the duty cycle.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Alvaro Herrera
Gregory Stark wrote:

 I can imagine a scenario where you have a system that's very busy for 60s and
 then idle for 60s repeatedly. And for some reason you configure a
 checkpoint_timeout on the order of 20m or so (assuming you're travelling
 precisely 60mph). 

Is that Scottish m?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] psql: add volatility to \df+

2007-06-26 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 Attached is a patch that adds information about function volatility to
 the output of psql's \df+ slash command. I'll apply this to HEAD
 tomorrow, barring any objections.

+1, but are there not any documentation changes to make?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] psql: add volatility to \df+

2007-06-26 Thread Joshua D. Drake

Tom Lane wrote:

Neil Conway [EMAIL PROTECTED] writes:

Attached is a patch that adds information about function volatility to
the output of psql's \df+ slash command. I'll apply this to HEAD
tomorrow, barring any objections.


+1, but are there not any documentation changes to make?


Well here is a question (just because I haven't seen it) is there a list 
of functions and their volatility in the docs?


Joshua D. Drake




regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] psql: add volatility to \df+

2007-06-26 Thread Neil Conway
On Tue, 2007-26-06 at 20:29 -0700, Joshua D. Drake wrote:
 Tom Lane wrote:
  +1, but are there not any documentation changes to make?

Sure, I'll update the psql ref page.

 Well here is a question (just because I haven't seen it) is there a list 
 of functions and their volatility in the docs?

Not relevant to this thread, but AFAICS there is no explicit list of the
volatilities of the builtin functions.

-Neil



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Cancel autovacuum conflicting with DROP TABLE

2007-06-26 Thread Alvaro Herrera
ITAGAKI Takahiro wrote:
 ITAGAKI Takahiro [EMAIL PROTECTED] wrote:
 
  Here is a patch that cancels autovacuum workers conflicting with
  DROP TABLE, TRUNCATE and CLUSTER. It was discussed here:
  http://archives.postgresql.org/pgsql-hackers/2007-06/msg00556.php
 
 I made an adjustment for the latest 'more autovacuum fixes' patch.
 http://archives.postgresql.org/pgsql-patches/2007-06/msg00217.php
 
 Now, autovacuum workers handles ProcDie signals using ERROR
 instead of FATAL. The exception is caught by the worker and
 converted to the following logs.
 
   SIGINT -- Cancel the current job.
   LOG: autovacuum on db.schema.table is canceled
   SIGTERM -- Cancel all jobs.
   LOG: autovacuum on db is canceled

Thanks for the patch.  I think there are actually three patches in here:

1. changing SIGINT so that it cancels the current table instead of
shutting down the entire worker.

2. changing DROP TABLE and TRUNCATE so that they cancel an autovac
worker by sending SIGINT.

3. change the interrupt code so that autovac workers are terminated with
ERROR instead of FATAL, knowing that the final outcome is the same.  If
I'm not mistaken the only point of the change is to be able to change
the error message, is that correct?


I don't feel very much comfortable with the patch (3).  I would prefer
that we keep errcode FATAL and select a different error message in
ProcessInterrupts instead.  I don't see the point in complicating the
sigjmp target without need.

I agree with the (2) change.

The change in (1) I don't feel comfortable with commenting.  It seems
strange to me, and although it seems like it would be safe, I cannot
help having a strange feeling about it.  I'll try to digest it a bit
more.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] psql: add volatility to \df+

2007-06-26 Thread Alvaro Herrera
Tom Lane wrote:
 Neil Conway [EMAIL PROTECTED] writes:
  Attached is a patch that adds information about function volatility to
  the output of psql's \df+ slash command. I'll apply this to HEAD
  tomorrow, barring any objections.
 
 +1, but are there not any documentation changes to make?

+1 as well but I'm wondering whether the output is too wide.  Is it
possible to find some shorter string to convey the same meaning?
Ideally, without being a single character.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
Saca el libro que tu religión considere como el indicado para encontrar la
oración que traiga paz a tu alma. Luego rebootea el computador
y ve si funciona (Carlos Duclós)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match