Re: [PERFORM] rapid degradation after postmaster restart

2004-03-17 Thread Matthew T. O'Connor
Andrew Sullivan wrote:

The vacuum delay stuff that you're working on may help, but I can't
really believe it's your salvation if this is happening after only a
few minutes.  No matter how much you're doing inside those functions,
you surely can't be causing so many dead tuples that a vacuum is
necessary that soon.  Did you try not vacuuming for a little while to
see if it helps?
 

Some of this thread was taken off line so I'm not sure it was mentioned 
on the list, but a big part of the problem was that Joe was running into 
the same bug that Cott Lang ran into a while ago which caused the vacuum 
threshold to get set far too low resulting in vacuums far too often..  
This has been fixed and the patch has been committed unfortunately it 
didn't make it into 7.4.2, but it will be in 7.4.3 / 7.5.

I didn't see it anywhere in this thread, but are you quite sure that
you're not swapping?  Note that vmstat on multiprocessor Solaris
machines is not notoriously useful.  You may want to have a look at
what the example stuff in the SE Toolkit tells you, or what you get
from sar.  I believe you have to use a special kernel setting on
Solaris to mark shared memory as being ineligible for swap.
 

I haven't heard from Joe how things are going with the fixed 
pg_autovacuum but that in combination with the vacuum delay stuff should 
work well.

Matthew



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-17 Thread Joe Conway
Arthur Ward wrote:
Jan's vacuum-delay-only patch that nobody can find is here:

http://archives.postgresql.org/pgsql-hackers/2003-11/msg00518.php

I've been using it in testing & production without any problems.
Great to know -- many thanks.

I've hacked my own vacuum-delay-only patch form Jan's all_performance 
patch. It looks like the only difference is that it uses usleep() 
instead of select(). So far the tests look promising.

Thanks,

Joe

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-17 Thread Joe Conway
Andrew Sullivan wrote:
Sorry I haven't had a chance to reply to this sooner.

The vacuum delay stuff that you're working on may help, but I can't
really believe it's your salvation if this is happening after only a
few minutes.  No matter how much you're doing inside those functions,
you surely can't be causing so many dead tuples that a vacuum is
necessary that soon.  Did you try not vacuuming for a little while to
see if it helps?
I discussed it later in the thread, but we're adding about 400K rows per 
hour and deleting most of them after processing (note this is a 
commercial app, written and maintained by another department -- I can 
recommend changes, but this late into their release cycle they are very 
reluctant to change the app). This is 7 x 24 data collection from 
equipment, so there is no "slow" time to use as a maintenance window.

But since the server in question is a test machine, I was able to shut 
everything off long enough to do a full vacuum -- it took about 12 hours.

I didn't see it anywhere in this thread, but are you quite sure that
you're not swapping?  Note that vmstat on multiprocessor Solaris
machines is not notoriously useful.  You may want to have a look at
what the example stuff in the SE Toolkit tells you, or what you get
from sar.  I believe you have to use a special kernel setting on
Solaris to mark shared memory as being ineligible for swap.
I'm (reasonably) sure there is no swapping. Minimum free memory (from 
top) is about 800 MB, and "vmstat -S" shows no swap-in or swap-out.

I've been playing with a version of Jan's performance patch in the past 
few hours. Based on my simulations, it appears that a 1 ms delay every 
10 pages is just about right. The performance hit is negligible (based 
on overall test time, and cpu % used by the vacuum process). I still 
have a bit more analysis to do, but this is looking pretty good. More 
later...

Joe

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-17 Thread Andrew Sullivan
Sorry I haven't had a chance to reply to this sooner.

On Fri, Mar 12, 2004 at 05:38:37PM -0800, Joe Conway wrote:
> The problem is this: the application runs an insert, that fires off a 
> trigger, that cascades into a fairly complex series of functions, that 
> do a bunch of calculations, inserts, updates, and deletes. Immediately 
> after a postmaster restart, the first insert or two take about 1.5 
> minutes (undoubtedly this could be improved, but it isn't the main 
> issue). However by the second or third insert, the time increases to 7 - 
> 9 minutes. Restarting the postmaster causes the cycle to repeat, i.e. 
> the first one or two inserts are back to the 1.5 minute range.

The vacuum delay stuff that you're working on may help, but I can't
really believe it's your salvation if this is happening after only a
few minutes.  No matter how much you're doing inside those functions,
you surely can't be causing so many dead tuples that a vacuum is
necessary that soon.  Did you try not vacuuming for a little while to
see if it helps?

I didn't see it anywhere in this thread, but are you quite sure that
you're not swapping?  Note that vmstat on multiprocessor Solaris
machines is not notoriously useful.  You may want to have a look at
what the example stuff in the SE Toolkit tells you, or what you get
from sar.  I believe you have to use a special kernel setting on
Solaris to mark shared memory as being ineligible for swap.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well. 
--Dennis Ritchie

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-17 Thread Arthur Ward
> The problem with Jan's more complex version of the patch (at least the
> one I found - perhaps not the right one) is it includes a bunch of other
> experimental stuff that I'd not want to mess with at the moment. Would
> changing the input units (for the original patch) from milli-secs to
> micro-secs be a bad idea? If so, I guess I'll get to extracting what I
> need from Jan's patch.

Jan's vacuum-delay-only patch that nobody can find is here:

http://archives.postgresql.org/pgsql-hackers/2003-11/msg00518.php

I've been using it in testing & production without any problems.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-16 Thread Joe Conway
Matthew T. O'Connor wrote:
If memory serves, the problem is that you actually sleep 10ms even when
you set it to 1.  One of the thing changed in Jan's later patch was the
ability to specify how many pages to work on before sleeping, rather
than how long to sleep inbetween every 1 page.  You might be able to do
a quick hack and have it do 10 pages or so before sleeping.
I thought I remembered something about that.

It turned out to be less difficult than I first thought to extract the 
vacuum delay stuff from Jan's performance patch. I haven't yet tried it 
out, but it's attached in case you are interested. I'll report back once 
I have some results.

Joe
Index: src/backend/access/nbtree/nbtree.c
===
RCS file: /home/pgsql/CvsRoot/pgsql-server/src/backend/access/nbtree/nbtree.c,v
retrieving revision 1.106
diff -c -b -r1.106 nbtree.c
*** src/backend/access/nbtree/nbtree.c  2003/09/29 23:40:26 1.106
--- src/backend/access/nbtree/nbtree.c  2003/11/03 17:56:54
***
*** 18,23 
--- 18,25 
   */
  #include "postgres.h"
  
+ #include 
+ 
  #include "access/genam.h"
  #include "access/heapam.h"
  #include "access/nbtree.h"
***
*** 27,32 
--- 29,39 
  #include "storage/smgr.h"
  
  
+ extern intvacuum_page_delay;
+ extern intvacuum_page_groupsize;
+ extern intvacuum_page_groupcount;
+ 
+ 
  /* Working state for btbuild and its callback */
  typedef struct
  {
***
*** 610,615 
--- 617,631 
  
CHECK_FOR_INTERRUPTS();
  
+   if (vacuum_page_delay > 0)
+   {
+   if (++vacuum_page_groupcount >= vacuum_page_groupsize)
+   {
+   vacuum_page_groupcount = 0;
+   usleep(vacuum_page_delay * 1000);
+   }
+   }
+ 
ndeletable = 0;
page = BufferGetPage(buf);
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
***
*** 736,741 
--- 752,768 
Buffer  buf;
Pagepage;
BTPageOpaque opaque;
+ 
+   CHECK_FOR_INTERRUPTS();
+ 
+   if (vacuum_page_delay > 0)
+   {
+   if (++vacuum_page_groupcount >= vacuum_page_groupsize)
+   {
+   vacuum_page_groupcount = 0;
+   usleep(vacuum_page_delay * 1000);
+   }
+   }
  
buf = _bt_getbuf(rel, blkno, BT_READ);
page = BufferGetPage(buf);
Index: src/backend/commands/vacuumlazy.c
===
RCS file: /home/pgsql/CvsRoot/pgsql-server/src/backend/commands/vacuumlazy.c,v
retrieving revision 1.32
diff -c -b -r1.32 vacuumlazy.c
*** src/backend/commands/vacuumlazy.c   2003/09/25 06:57:59 1.32
--- src/backend/commands/vacuumlazy.c   2003/11/03 17:57:27
***
*** 37,42 
--- 37,44 
   */
  #include "postgres.h"
  
+ #include 
+ 
  #include "access/genam.h"
  #include "access/heapam.h"
  #include "access/xlog.h"
***
*** 88,93 
--- 90,99 
  static TransactionId OldestXmin;
  static TransactionId FreezeLimit;
  
+ int   vacuum_page_delay = 0;  /* milliseconds per page group */
+ int   vacuum_page_groupsize = 10; /* group size */
+ int   vacuum_page_groupcount = 0; /* current group size count */
+ 
  
  /* non-export function prototypes */
  static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
***
*** 228,233 
--- 234,248 
  
CHECK_FOR_INTERRUPTS();
  
+   if (vacuum_page_delay > 0)
+   {
+   if (++vacuum_page_groupcount >= vacuum_page_groupsize)
+   {
+   vacuum_page_groupcount = 0;
+   usleep(vacuum_page_delay * 1000);
+   }
+   }
+ 
/*
 * If we are close to overrunning the available space for
 * dead-tuple TIDs, pause and do a cycle of vacuuming before we
***
*** 469,474 
--- 484,498 
  
CHECK_FOR_INTERRUPTS();
  
+   if (vacuum_page_delay > 0)
+   {
+   if (++vacuum_page_groupcount >= vacuum_page_groupsize)
+   {
+   vacuum_page_groupcount = 0;
+   usleep(vacuum_page_delay * 1000);
+   }
+   }
+ 
tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
buf = ReadBuffer(onerel, tblk);

Re: [PERFORM] rapid degradation after postmaster restart

2004-03-16 Thread Tom Lane
Joe Conway <[EMAIL PROTECTED]> writes:
> I have tested Tom's original patch now. The good news -- it works great 
> in terms of reducing the load imposed by vacuum -- almost to the level 
> of being unnoticeable. The bad news -- in a simulation test which loads 
> an hour's worth of data, even with delay set to 1 ms, vacuum of the 
> large table exceeds two hours (vs 12-14 minutes with delay = 0). Since 
> that hourly load is expected 7 x 24, this obviously isn't going to work.

Turns the dial down a bit too far then ...

> The problem with Jan's more complex version of the patch (at least the 
> one I found - perhaps not the right one) is it includes a bunch of other 
> experimental stuff that I'd not want to mess with at the moment. Would 
> changing the input units (for the original patch) from milli-secs to 
> micro-secs be a bad idea?

Unlikely to be helpful; on most kernels the minimum sleep delay is 1 or
10 msec, so asking for a few microsec is the same as asking for some
millisec.  I think what you need is a knob of the form "sleep N msec
after each M pages of I/O".  I'm almost certain that Jan posted such a
patch somewhere between my original and the version you refer to above.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-16 Thread Matthew T. O'Connor
On Tue, 2004-03-16 at 23:49, Joe Conway wrote:
I have tested Tom's original patch now. The good news -- it works great 
in terms of reducing the load imposed by vacuum -- almost to the level 
of being unnoticeable. The bad news -- in a simulation test which loads 
an hour's worth of data, even with delay set to 1 ms, vacuum of the 
large table exceeds two hours (vs 12-14 minutes with delay = 0). Since 
that hourly load is expected 7 x 24, this obviously isn't going to work.
If memory serves, the problem is that you actually sleep 10ms even when
you set it to 1.  One of the thing changed in Jan's later patch was the
ability to specify how many pages to work on before sleeping, rather
than how long to sleep inbetween every 1 page.  You might be able to do
a quick hack and have it do 10 pages or so before sleeping.
Matthew

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-16 Thread Joe Conway
Tom Lane wrote:
Joe Conway <[EMAIL PROTECTED]> writes:

Any idea where I can get my hands on the latest version. I found the 
original post from Tom, but I thought there was a later version with 
both number of pages and time to sleep as knobs.
That was as far as I got.  I think Jan posted a more complex version
that would still be reasonable to apply to 7.4.
I have tested Tom's original patch now. The good news -- it works great 
in terms of reducing the load imposed by vacuum -- almost to the level 
of being unnoticeable. The bad news -- in a simulation test which loads 
an hour's worth of data, even with delay set to 1 ms, vacuum of the 
large table exceeds two hours (vs 12-14 minutes with delay = 0). Since 
that hourly load is expected 7 x 24, this obviously isn't going to work.

The problem with Jan's more complex version of the patch (at least the 
one I found - perhaps not the right one) is it includes a bunch of other 
experimental stuff that I'd not want to mess with at the moment. Would 
changing the input units (for the original patch) from milli-secs to 
micro-secs be a bad idea? If so, I guess I'll get to extracting what I 
need from Jan's patch.

Thanks,

Joe

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-15 Thread Joe Conway
Matthew T. O'Connor wrote:
Strange... I wonder if this is some integer overflow problem.  There was 
one reported recently and fixed as of CVS head yesterday, you might try 
that, however without the -d2 output I'm only guessing at why 
pg_autovacuum is vacuuming so much / so often.
I'll see what I can do tomorrow to track it down.

I have already recommended to the program manager that they switch to 
7.4.2 plus the autovacuum patch. Not sure they will be willing to make 
any changes at this stage in their release process though.

If we can't find one, any chance you can 
do some testing with CVS HEAD just to see if that works any better.  I 
know there has been a fair amount of work done to improve this situation 
(not just vacuum delay, but ARC etc...)
I might do that, but not likely on Solaris. I can probably get a copy of 
the current database and testing scripts, and give it a try on one of my 
own machines (all Linux, either RHAS3, RH9, or Fedora).

Joe

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-15 Thread Joe Conway
Tom Lane wrote:
Joe Conway <[EMAIL PROTECTED]> writes:
Any idea where I can get my hands on the latest version. I found the 
original post from Tom, but I thought there was a later version with 
both number of pages and time to sleep as knobs.
That was as far as I got.  I think Jan posted a more complex version
that would still be reasonable to apply to 7.4.
I thought that too, but was having trouble finding it. I'll look again.

Thanks,

Joe

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-15 Thread Matthew T. O'Connor
Joe Conway wrote:

Yeah, I'm sure. Snippets from the log:

[...lots-o-tables...]
[2004-03-14 12:44:48 PM] added table: specdb."public"."parametric_states"
[2004-03-14 12:49:48 PM] Performing: VACUUM ANALYZE 
"public"."transaction_data"
[2004-03-14 01:29:59 PM] Performing: VACUUM ANALYZE 
"public"."transaction_data"
[2004-03-14 02:08:26 PM] Performing: ANALYZE "public"."out_of_spec"
[2004-03-14 02:08:26 PM] Performing: VACUUM ANALYZE 
"public"."transaction_data"
[2004-03-14 02:22:44 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
[2004-03-14 03:06:45 PM] Performing: VACUUM ANALYZE 
"public"."out_of_spec"
[2004-03-14 03:06:45 PM] Performing: VACUUM ANALYZE 
"public"."transaction_data"
[2004-03-14 03:19:51 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
[2004-03-14 03:21:09 PM] Performing: ANALYZE "public"."parametric_states"
[2004-03-14 03:54:57 PM] Performing: ANALYZE "public"."out_of_spec"
[2004-03-14 03:54:57 PM] Performing: VACUUM ANALYZE 
"public"."transaction_data"
[2004-03-14 04:07:52 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
[2004-03-14 04:09:33 PM] Performing: ANALYZE 
"public"."equip_status_history"
[2004-03-14 04:09:33 PM] Performing: VACUUM ANALYZE 
"public"."parametric_states"
[2004-03-14 04:43:46 PM] Performing: VACUUM ANALYZE 
"public"."out_of_spec"
[2004-03-14 04:43:46 PM] Performing: VACUUM ANALYZE 
"public"."transaction_data"
[2004-03-14 04:56:35 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
[2004-03-14 04:58:32 PM] Performing: ANALYZE "public"."parametric_states"
[2004-03-14 05:28:58 PM] added database: specdb


Yeah, you're right.

This is the entire period of the first test, with default autovac 
settings. The table "public"."transaction_data" is the one with 28 
million active rows. The entire test run inserts about 600 x 600 = 
360,000 rows, out of which roughly two-thirds are later deleted.


Strange... I wonder if this is some integer overflow problem.  There was 
one reported recently and fixed as of CVS head yesterday, you might try 
that, however without the -d2 output I'm only guessing at why 
pg_autovacuum is vacuuming so much / so often.

I can try. The server belongs to another department, and they are 
under the gun to get back on track with their testing. Also, they 
compiled without debug symbols, so I need to get permission to recompile.


Good luck, I hope you can get permission.  Would e nice to fix this 
little crash.

Yes I would be very curious to see the results with the vacuum delay 
patch installed (is that patch applied to HEAD?)


Any idea where I can get my hands on the latest version. I found the 
original post from Tom, but I thought there was a later version with 
both number of pages and time to sleep as knobs.


I think Jan posted one a while back  [searches archives...]  But I 
must say I'm at a loss to find it in the archives.  Anyone know where a 
good delay patch is for 7.4?   If we can't find one, any chance you can 
do some testing with CVS HEAD just to see if that works any better.  I 
know there has been a fair amount of work done to improve this situation 
(not just vacuum delay, but ARC etc...)
.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-15 Thread Tom Lane
Joe Conway <[EMAIL PROTECTED]> writes:
> Any idea where I can get my hands on the latest version. I found the 
> original post from Tom, but I thought there was a later version with 
> both number of pages and time to sleep as knobs.

That was as far as I got.  I think Jan posted a more complex version
that would still be reasonable to apply to 7.4.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-15 Thread Joe Conway
Matthew T. O'Connor wrote:
I think you understand correctly.  A table with 1,000,000 rows should 
get vacuumed approx every 2,000,000 changes (assuming default values for 
-V ).  FYI and insert and a delete count as one change, but and update 
counts as two.

Unfortunately, the running with -d2 would show the numbers that 
pg_autovacuum is using to decide if it when it should vacuum or 
analyze.Also, are you sure that it vacuumed more than once and 
wasn't doing analyzes most of the time?
Yeah, I'm sure. Snippets from the log:

[...lots-o-tables...]
[2004-03-14 12:44:48 PM] added table: specdb."public"."parametric_states"
[2004-03-14 12:49:48 PM] Performing: VACUUM ANALYZE 
"public"."transaction_data"
[2004-03-14 01:29:59 PM] Performing: VACUUM ANALYZE 
"public"."transaction_data"
[2004-03-14 02:08:26 PM] Performing: ANALYZE "public"."out_of_spec"
[2004-03-14 02:08:26 PM] Performing: VACUUM ANALYZE 
"public"."transaction_data"
[2004-03-14 02:22:44 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
[2004-03-14 03:06:45 PM] Performing: VACUUM ANALYZE "public"."out_of_spec"
[2004-03-14 03:06:45 PM] Performing: VACUUM ANALYZE 
"public"."transaction_data"
[2004-03-14 03:19:51 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
[2004-03-14 03:21:09 PM] Performing: ANALYZE "public"."parametric_states"
[2004-03-14 03:54:57 PM] Performing: ANALYZE "public"."out_of_spec"
[2004-03-14 03:54:57 PM] Performing: VACUUM ANALYZE 
"public"."transaction_data"
[2004-03-14 04:07:52 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
[2004-03-14 04:09:33 PM] Performing: ANALYZE "public"."equip_status_history"
[2004-03-14 04:09:33 PM] Performing: VACUUM ANALYZE 
"public"."parametric_states"
[2004-03-14 04:43:46 PM] Performing: VACUUM ANALYZE "public"."out_of_spec"
[2004-03-14 04:43:46 PM] Performing: VACUUM ANALYZE 
"public"."transaction_data"
[2004-03-14 04:56:35 PM] Performing: VACUUM ANALYZE "public"."spc_graphs"
[2004-03-14 04:58:32 PM] Performing: ANALYZE "public"."parametric_states"
[2004-03-14 05:28:58 PM] added database: specdb

This is the entire period of the first test, with default autovac 
settings. The table "public"."transaction_data" is the one with 28 
million active rows. The entire test run inserts about 600 x 600 = 
360,000 rows, out of which roughly two-thirds are later deleted.

That's unfortunate as that is the detail we need to see what 
pg_autovacuum thinks is really going on.  We had a similar sounding 
crash on FreeBSD due to some unitialized variables that were being 
printed out by the debug code, however that was fixed a long time ago.  
Any chance you can look into this?
I can try. The server belongs to another department, and they are under 
the gun to get back on track with their testing. Also, they compiled 
without debug symbols, so I need to get permission to recompile.

Yes I would be very curious to see the results with the vacuum delay 
patch installed (is that patch applied to HEAD?)
Any idea where I can get my hands on the latest version. I found the 
original post from Tom, but I thought there was a later version with 
both number of pages and time to sleep as knobs.

Thanks,

Joe

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-15 Thread Matthew T. O'Connor
Joe Conway wrote:

A few pg_autovacuum questions came out of this:

  First, the default vacuum scaling factor is 2, which I think implies
  the big table should only get vacuumed every 56 million or so changes.
  I didn't come anywhere near that volume in my tests, yet the table did
  get vacuumed more than once (I was watching the pg_autovacuum log
  output). Do I misunderstand this setting?


I think you understand correctly.  A table with 1,000,000 rows should 
get vacuumed approx every 2,000,000 changes (assuming default values for 
-V ).  FYI and insert and a delete count as one change, but and update 
counts as two.

Unfortunately, the running with -d2 would show the numbers that 
pg_autovacuum is using to decide if it when it should vacuum or 
analyze.Also, are you sure that it vacuumed more than once and 
wasn't doing analyzes most of the time? 

Also, I'm not sure if 2 is a good default value for the scaling factor 
but I erred on the side of not vacuuming too often.

  Second, Matthew requested pg_autovacuum run with -d2; I found that
  with -d2 set, pg_autovacuum would immediately exit on start. -d0 and
  -d1 work fine however.


That's unfortunate as that is the detail we need to see what 
pg_autovacuum thinks is really going on.  We had a similar sounding 
crash on FreeBSD due to some unitialized variables that were being 
printed out by the debug code, however that was fixed a long time ago.  
Any chance you can look into this?

That's all I can think of at the moment. I'd like to try the 7.4 patch 
that makes vacuum sleep every few pages -- can anyone point me to the 
latest and greatest that will apply to 7.4?


Yes I would be very curious to see the results with the vacuum delay 
patch installed (is that patch applied to HEAD?)



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-13 Thread Matthew T. O'Connor
Joe Conway wrote:

Tom Lane wrote:

Just to be clear on this: you have to restart the postmaster to bring
the time back down?  Simply starting a fresh backend session doesn't do
it?


IIRC, shared buffers was reasonable, maybe 128MB. One thing that is 
worthy of note is that they are using pg_autovacuum and a very low 
vacuum_mem setting (1024). But I also believe that max_fsm_relations 
and max_fsm_pages have been bumped up from default (something like 
1 & 20).

pg_autovacuum could be a problem if it's vacuuming too often.  Have you 
looked to see if a vacuum or analyze is running while the server is 
slow?  If so, have you played with the pg_autovacuum default vacuum and 
analyze thresholds?  If it appears that it is related to pg_autovacuum 
please send me the command options used to run it and a logfile of it's 
output running at at a debug level of -d2

Matthew

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-13 Thread Josh Berkus
Joe,

> IIRC, shared buffers was reasonable, maybe 128MB. One thing that is 
> worthy of note is that they are using pg_autovacuum and a very low 
> vacuum_mem setting (1024). But I also believe that max_fsm_relations and 
> max_fsm_pages have been bumped up from default (something like 1 & 
> 20).

pg_autovacuum may be your problem.   Imagine this:

1) The chain of updates and inserts called by the procedures makes enough 
changes, on its own, to trigger pg_autovacuum.
2) Because they have a big database, and a low vacuum_mem, a vacuum of the 
largest table takes noticable time, like several minutes.
3) This means that the vacuum is still running during the second and 
succeeding events 

Something to check by watching the process list.

FWIW, I don't use pg_autovacuum for databases which have frequent large batch 
updates; I find it results in uneven performance.

Feel free to phone me if you're still stuck!

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-13 Thread Joe Conway
Marty Scholes wrote:
I have seen similar results to what you are describing.

I found that running a full vacuum:

vacuumdb -fza

followed by a checkpoint makes it run fast again.

Try timing the update with and without a full vacuum.
Will do. I'll let you know how it goes.

Thanks for the reply.

Joe

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-13 Thread Tom Lane
Joe Conway <[EMAIL PROTECTED]> writes:
> ... Immediately 
> after a postmaster restart, the first insert or two take about 1.5 
> minutes (undoubtedly this could be improved, but it isn't the main 
> issue). However by the second or third insert, the time increases to 7 - 
> 9 minutes. Restarting the postmaster causes the cycle to repeat, i.e. 
> the first one or two inserts are back to the 1.5 minute range.

Just to be clear on this: you have to restart the postmaster to bring
the time back down?  Simply starting a fresh backend session doesn't do
it?

Are you using particularly large values for shared_buffers or any of the
other resource parameters?

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-12 Thread Tom Lane
Joe Conway <[EMAIL PROTECTED]> writes:
> The problem is this: the application runs an insert, that fires off a 
> trigger, that cascades into a fairly complex series of functions, that 
> do a bunch of calculations, inserts, updates, and deletes. Immediately 
> after a postmaster restart, the first insert or two take about 1.5 
> minutes (undoubtedly this could be improved, but it isn't the main 
> issue). However by the second or third insert, the time increases to 7 - 
> 9 minutes. Restarting the postmaster causes the cycle to repeat, i.e. 
> the first one or two inserts are back to the 1.5 minute range.

I realize this question might take some patience to answer, but what
does the performance curve look like beyond three trials?  Does it level
off or continue to get worse?  If it doesn't level off, does the
degradation seem linear in the number of trials, or worse than linear?

I have no ideas in mind, just trying to gather data ...

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-12 Thread Marty Scholes
Six days ago I installed Pg 7.4.1 on Sparc Solaris 8 also.  I am hopeful 
that we as well can migrate a bunch of our apps from Oracle.

After doing some informal benchmarks and performance testing for the 
past week I am becoming more and more impressed with what I see.

I have seen similar results to what you are describing.

I found that running a full vacuum:

vacuumdb -fza

followed by a checkpoint makes it run fast again.

Try timing the update with and without a full vacuum.

I can't help but wonder if a clean shutdown includes some vacuuming.

Obviously, in a production database this would be an issue.

Please post back what you learn.

Sincerely,
Marty
I have been doing a bunch of informat

Joe Conway wrote:
I'm trying to troubleshoot a performance issue on an application ported 
from Oracle to postgres. Now, I know the best way to get help is to post 
the schema, explain analyze output, etc, etc -- unfortunately I can't do 
that at the moment. However, maybe someone can point me in the right 
direction to figure this out on my own. That said, here are a few 
details...

PostgreSQL 7.4.1
bash-2.03$ uname -a
SunOS col65 5.8 Generic_108528-27 sun4u sparc SUNW,Sun-Fire-280R
The problem is this: the application runs an insert, that fires off a 
trigger, that cascades into a fairly complex series of functions, that 
do a bunch of calculations, inserts, updates, and deletes. Immediately 
after a postmaster restart, the first insert or two take about 1.5 
minutes (undoubtedly this could be improved, but it isn't the main 
issue). However by the second or third insert, the time increases to 7 - 
9 minutes. Restarting the postmaster causes the cycle to repeat, i.e. 
the first one or two inserts are back to the 1.5 minute range.

Any ideas spring to mind? I don't have much experience with Postgres on 
Solaris -- could it be related to that somehow?

Thanks for any insights.

Joe

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])