[PERFORM] perf pb solved only after pg_dump and restore

2006-08-28 Thread Guillaume Cottenceau
Hi,

We noticed a slowdown on our application while traffic was kinda
heavy. The logics after reading the docs commanded us to trim the
enlarged tables, run VACUUM ANALYZE and then expect fast
performance again; but it wasn't the case[1].

Out of the blue, we dumped the database, removed it, recreated
from the restore, and now the performance is lightning fast
again.

Does it look familiar to anyone? I thought running VACUUM ANALYZE
after a trim should be enough so that pg has assembled the data
and has good statistical knowledge of the tables contents..

Thanks for any tips.

Ref: 
[1] Processes were always showing one/some postmaster on SELECT,
a constant load of 1, and vmstat always showing activity in
IO blocks out (application generate all sort of typical
statements, some SELECT, UPDATE, INSERT either directly or
through stored procedures)

-- 
Guillaume Cottenceau
Create your personal SMS or WAP Service - visit http://mobilefriends.ch/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] perf pb solved only after pg_dump and restore

2006-08-28 Thread Markus Schaber
Hi, Guillaume

Guillaume Cottenceau wrote:

 We noticed a slowdown on our application while traffic was kinda
 heavy. The logics after reading the docs commanded us to trim the
 enlarged tables, run VACUUM ANALYZE and then expect fast
 performance again; but it wasn't the case[1].

What exactly do you mean with trim the enlarged tables?

 Out of the blue, we dumped the database, removed it, recreated
 from the restore, and now the performance is lightning fast
 again.
 
 Does it look familiar to anyone? I thought running VACUUM ANALYZE
 after a trim should be enough so that pg has assembled the data
 and has good statistical knowledge of the tables contents..

This looks like either your free_space_map setting is way to low, or you
have index bloat.

Maybe a VACUUM FULL fullowed by a REINDEX will have solved your problem.

It also might make sense to issue a CLUSTER instead (which combines the
effects of VACUUM FULL, REINDEX and physically reordering the data).

When the free_space_map is to low, VACUUM ANALYZE should have told you
via a warning (at least, if your logging is set appropriately).


HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] perf pb solved only after pg_dump and restore

2006-08-28 Thread Guillaume Cottenceau
Hi Markus,

Thanks for your message.

 Guillaume Cottenceau wrote:
 
  We noticed a slowdown on our application while traffic was kinda
  heavy. The logics after reading the docs commanded us to trim the
  enlarged tables, run VACUUM ANALYZE and then expect fast
  performance again; but it wasn't the case[1].
 
 What exactly do you mean with trim the enlarged tables?

We have a couple of logs files which get larger over time
(millions of rows). As they are log files, they can be trimmed
from older values.
 
  Out of the blue, we dumped the database, removed it, recreated
  from the restore, and now the performance is lightning fast
  again.
  
  Does it look familiar to anyone? I thought running VACUUM ANALYZE
  after a trim should be enough so that pg has assembled the data
  and has good statistical knowledge of the tables contents..
 
 This looks like either your free_space_map setting is way to low, or you

I don't know much about free_space_map. Trying to search in
documentation, I found run time configuration of the two
following parameters for which the current values follow:

 max_fsm_pages is 2
 max_fsm_relations is 1000

Do they look low?

Notice: table data is only 600M after trim (without indexes),
while it was probably 3x to 10x this size before the trim.
Machine is a 2G Dell 1850 with lsi logic megaraid.

 have index bloat.

Can you elaborate? I have created a couple of indexes (according
to multiple models of use in our application) and they do take up
quite some disk space (table dump is 600M but after restore it
takes up 1.5G on disk) but I thought they could only do good or
never be used, not impair performance..

 Maybe a VACUUM FULL fullowed by a REINDEX will have solved your problem.

So these would have reordered the data for faster sequential
access which is not the case of VACUUM ANALYZE?
 
 It also might make sense to issue a CLUSTER instead (which combines the
 effects of VACUUM FULL, REINDEX and physically reordering the data).

I was reluctant in using CLUSTER because you have to choose an
index and there are multiple indexes on the large tables..

 When the free_space_map is to low, VACUUM ANALYZE should have told you
 via a warning (at least, if your logging is set appropriately).

Unfortunately, we didn't keep the logs of VACUUM ANALYZE, so I
can't be sure :/

-- 
Guillaume Cottenceau
Create your personal SMS or WAP Service - visit http://mobilefriends.ch/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] perf pb solved only after pg_dump and restore

2006-08-28 Thread Guillaume Smet

Guillaume,

On 28 Aug 2006 11:43:16 +0200, Guillaume Cottenceau [EMAIL PROTECTED] wrote:

 max_fsm_pages is 2
 max_fsm_relations is 1000
Do they look low?


Yes they are probably too low if you don't run VACUUM on a regular
basis and you have a lot of UPDATE/DELETE activity. FSM doesn't take a
lot of memory so it's usually recommended to have a confortable value
for it.

I usually recommend to read:
http://www.pervasive-postgres.com/instantkb13/article.aspx?id=10116cNode=5K1C3W
http://www.pervasive-postgres.com/instantkb13/article.aspx?id=10087cNode=5K1C3W
to understand better what VACUUM and FSM mean.


Can you elaborate? I have created a couple of indexes (according
to multiple models of use in our application) and they do take up
quite some disk space (table dump is 600M but after restore it
takes up 1.5G on disk) but I thought they could only do good or
never be used, not impair performance..


Index slow downs write activity (you have to maintain them). It's not
always a good idea to create them.


 Maybe a VACUUM FULL fullowed by a REINDEX will have solved your problem.

So these would have reordered the data for faster sequential
access which is not the case of VACUUM ANALYZE?


VACUUM ANALYZE won't help you if your database is completely bloated.
And AFAICS you're not running it on a regular basis so your database
was probably completely bloated which means:
- bloated indexes,
- bloated tables (ie a lot of fragmentation in the pages which means
that you need far more pages to store the same data).

The only ways to solve this situation is either to dump/restore or run
a VACUUM FULL ANALYZE (VERBOSE is better to keep a log), and
eventually reindex any bloated index (depends on your situation).


 When the free_space_map is to low, VACUUM ANALYZE should have told you
 via a warning (at least, if your logging is set appropriately).

Unfortunately, we didn't keep the logs of VACUUM ANALYZE, so I
can't be sure :/


You should really run VACUUM ANALYZE VERBOSE on a regular basis and
analyze the logs to be sure your VACUUM strategy and FSM settings are
OK.

I developed http://pgfouine.projects.postgresql.org/vacuum.html to
help us doing it on our production databases.

Regards,

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] perf pb solved only after pg_dump and restore

2006-08-28 Thread Markus Schaber
Hi, Guillaume,

Guillaume Cottenceau wrote:

 We have a couple of logs files which get larger over time
 (millions of rows). As they are log files, they can be trimmed
 from older values.

Ah, ok, you DELETEd the old rows.

So I assume that you never UPDATE, but only INSERT new entries and
sometimes DELETE a big bunch of entries from the beginning.

This is a special usage pattern, where the normal VACUUM is not well
suited for.

DELETing rows itsself does not free any space. Only after your
transaction is committed, a following VACUUM FULL or CLUSTER does
actually free the space.

VACUUM and VACUUM ANALYZE only remove obsolete rows from the pages and
marks them free (by entering them into the free space map, as long as
that one is large enough). That means that your table will actually stay
as large as before, having 90% of free pages at the beginning and 10%
used pages at the very end. New INSERTs and UPDATEs will prefer to use
pages from the free space map before allocating new pages, but the
existing rows will stay forever.

Now, VACUUM FULL actively moves rows to the beginning of the table,
allowing to cut the end of the table, while CLUSTER recreates the table
from scratch, in index order. Both lead to a compact storage, having all
used rows at the beginning, and no free pages.

So, I think, in your case VACUUM FULL and CLUSTER would both have solved
your problem.

  max_fsm_pages is 2
 Do they look low?
 Notice: table data is only 600M after trim (without indexes),
 while it was probably 3x to 10x this size before the trim.

10x the size means 6G, so 5.4G of data were freed by the trim. Each page
has 8k in size, so the fsm needs about 675000 pages. So, yes, for your
usage, they look low, and give very suboptimal results.

 have index bloat.
 
 Can you elaborate? I have created a couple of indexes (according
 to multiple models of use in our application) and they do take up
 quite some disk space (table dump is 600M but after restore it
 takes up 1.5G on disk) but I thought they could only do good or
 never be used, not impair performance..

Like tables, indices may suffer from getting bloated by old, unused
entries. Especially the GIST based indices in 7.4 (used by PostGIS and
other plugins) suffered from that problem[1], but recent PostgreSQL
versions have improved in this area.

Now, when the query planner decides to use an index, the index access is
extremely slow because of all the deleted entries the index scan has to
skip.

However, from the additional information you gave above, I doubt it was
index bloat.

 Maybe a VACUUM FULL fullowed by a REINDEX will have solved your problem.
 
 So these would have reordered the data for faster sequential
 access which is not the case of VACUUM ANALYZE?

A VACUUM FULL would have reordered the data, and a REINDEX would have
optimized the index.

 It also might make sense to issue a CLUSTER instead (which combines the
 effects of VACUUM FULL, REINDEX and physically reordering the data).
 
 I was reluctant in using CLUSTER because you have to choose an
 index and there are multiple indexes on the large tables..

Usually, CLUSTERing on one index does not necessarily slow down accesses
on other indices, compared to the non-clustered (= random) table before.

If you have some indices that are somehow related (e. G. a timestamp and
a serial number), CLUSTERing on one index does automatically help the
other index, especially as the query planer uses corellation statistics.

Btw, if your queries often include 2 or 3 columns, a multi-column index
(and clustering on that index) might be the best.

 When the free_space_map is to low, VACUUM ANALYZE should have told you
 via a warning (at least, if your logging is set appropriately).
 
 Unfortunately, we didn't keep the logs of VACUUM ANALYZE, so I
 can't be sure :/

AFAIK, the warning is also output on the psql command line.

HTH,
Markus

[1] We once had an index that was about 100 times larger before REINDEX.

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] perf pb solved only after pg_dump and restore

2006-08-28 Thread Guillaume Cottenceau
Guillaume,

Thanks for your help.

 On 28 Aug 2006 11:43:16 +0200, Guillaume Cottenceau [EMAIL PROTECTED] wrote:
   max_fsm_pages is 2
   max_fsm_relations is 1000
  Do they look low?
 
 Yes they are probably too low if you don't run VACUUM on a regular
 basis and you have a lot of UPDATE/DELETE activity. FSM doesn't take a
 lot of memory so it's usually recommended to have a confortable value
 for it.

Normally, we run VACUUM ANALYZE overnight. I'd say we have low
DELETE activity, kinda high SELECT/INSERT activity, and UPDATE
would be in the middle of that.
 
 I usually recommend to read:
 http://www.pervasive-postgres.com/instantkb13/article.aspx?id=10116cNode=5K1C3W
 http://www.pervasive-postgres.com/instantkb13/article.aspx?id=10087cNode=5K1C3W
 to understand better what VACUUM and FSM mean.

Thanks for the pointer, will read that.
 
  Can you elaborate? I have created a couple of indexes (according
  to multiple models of use in our application) and they do take up
  quite some disk space (table dump is 600M but after restore it
  takes up 1.5G on disk) but I thought they could only do good or
  never be used, not impair performance..
 
 Index slow downs write activity (you have to maintain them). It's not
 always a good idea to create them.

Of course. How newbie did I look :/. The thing is that I once did
a few measurements and noticed no (measurable) impact in INSERT
with a supplementary index, so I (wrongly) forgot about this.
 
   Maybe a VACUUM FULL fullowed by a REINDEX will have solved your problem.
 
  So these would have reordered the data for faster sequential
  access which is not the case of VACUUM ANALYZE?
 
 VACUUM ANALYZE won't help you if your database is completely bloated.

What do you mean exactly by bloated? If you mean that there is a
lot of (unused) data, the thing is that our trim removed most of
it. I was kinda hoping that after analyzing the database, the old
data would exit the whole picture, which obviously wasn't the
case.

About REINDEX: is it ok to consider that REINDEX is to indexes
what VACUUM FULL is to table data, because it cleans up unused
index pages?

 And AFAICS you're not running it on a regular basis so your database
 was probably completely bloated which means:
 - bloated indexes,
 - bloated tables (ie a lot of fragmentation in the pages which means
 that you need far more pages to store the same data).

I suppose that table fragmentation occurs when DELETE are
interleaved with INSERT?
 
 The only ways to solve this situation is either to dump/restore or run
 a VACUUM FULL ANALYZE (VERBOSE is better to keep a log), and
 eventually reindex any bloated index (depends on your situation).

Ok.
 
   When the free_space_map is to low, VACUUM ANALYZE should have told you
   via a warning (at least, if your logging is set appropriately).
 
  Unfortunately, we didn't keep the logs of VACUUM ANALYZE, so I
  can't be sure :/
 
 You should really run VACUUM ANALYZE VERBOSE on a regular basis and
 analyze the logs to be sure your VACUUM strategy and FSM settings are
 OK.

VACUUM ANALYZE is normally run overnight (each night). Is it not
regular enough? There can be hundreds of thousands of statements
a day.

-- 
Guillaume Cottenceau
Create your personal SMS or WAP Service - visit http://mobilefriends.ch/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] perf pb solved only after pg_dump and restore

2006-08-28 Thread Guillaume Cottenceau
  We have a couple of logs files which get larger over time
  (millions of rows). As they are log files, they can be trimmed
  from older values.
 
 Ah, ok, you DELETEd the old rows.

Yes.
 
 So I assume that you never UPDATE, but only INSERT new entries and
 sometimes DELETE a big bunch of entries from the beginning.

Actually, in the version of software where we have the problem,
that's exactly the case. But in newer versions, UPDATE come into
the picture (typically on recently inserted rows - one or two
updates per row). Does UPDATE change anything? Row selection is
done on the primary key (of SERIAL type).

 This is a special usage pattern, where the normal VACUUM is not well
 suited for.
 
 DELETing rows itsself does not free any space. Only after your
 transaction is committed, a following VACUUM FULL or CLUSTER does
 actually free the space.
 
 VACUUM and VACUUM ANALYZE only remove obsolete rows from the pages and
 marks them free (by entering them into the free space map, as long as
 that one is large enough). That means that your table will actually stay
 as large as before, having 90% of free pages at the beginning and 10%
 used pages at the very end. New INSERTs and UPDATEs will prefer to use
 pages from the free space map before allocating new pages, but the
 existing rows will stay forever.

Yes, that what I had in mind. But I assumed that performance
would be reclaimed (as if VACUUM FULL was run) because the
statistics after analyzing are accurate as to data distribution,
only disk space would not be reclaimed (but we don't care, at
least for the moment).
 
 Now, VACUUM FULL actively moves rows to the beginning of the table,
 allowing to cut the end of the table, while CLUSTER recreates the table
 from scratch, in index order. Both lead to a compact storage, having all
 used rows at the beginning, and no free pages.

I actually assumed that VACUUM ANALYZE would order rows
sequentially on disk (mainly because it was taking quite some
time and a lot of disk output activity), but obviously this was
wrong.
 
 So, I think, in your case VACUUM FULL and CLUSTER would both have solved
 your problem.

Ok.
 
   max_fsm_pages is 2
  Do they look low?
  Notice: table data is only 600M after trim (without indexes),
  while it was probably 3x to 10x this size before the trim.
 
 10x the size means 6G, so 5.4G of data were freed by the trim. Each page
 has 8k in size, so the fsm needs about 675000 pages. So, yes, for your
 usage, they look low, and give very suboptimal results.

max_fsm_pages = 675000 means we also need to enlarge shared
buffers, or the shared buffers available space for data caching
would be reduced, right?

I guess the bottom line is that I don't understand what the Free
Space Map behaviour really is. Is it a map containing location of
free disk pages, free meaning that they correspond to pages
removed with DELETE but not yet released to the OS with VACUUM
FULL, which are used for INSERT in favor of enlarging the size of
data used on disk? If that's correct, am I right in assuming that
we don't care about the Free Space Map size if we perform a
VACUUM FULL right after large bunches of DELETE?
 
  have index bloat.
  
  Can you elaborate? I have created a couple of indexes (according
  to multiple models of use in our application) and they do take up
  quite some disk space (table dump is 600M but after restore it
  takes up 1.5G on disk) but I thought they could only do good or
  never be used, not impair performance..
 
 Like tables, indices may suffer from getting bloated by old, unused
 entries. Especially the GIST based indices in 7.4 (used by PostGIS and
 other plugins) suffered from that problem[1], but recent PostgreSQL
 versions have improved in this area.

We actually are obliged to use 7.4.5 :/

Am I correct in assuming that regularly running REINDEX would cut
this bloat? (daily)

(documentation very much insists on solving index data corruption
with REINDEX and doesn't talk much about removing old obsolete
data)

(also, is there any way to REINDEX all index of all tables
easily? as when we do just VACUUM ANALYZE for the whole
database)

 Now, when the query planner decides to use an index, the index access is
 extremely slow because of all the deleted entries the index scan has to
 skip.

I see.
 
 However, from the additional information you gave above, I doubt it was
 index bloat.

[...]

-- 
Guillaume Cottenceau
Create your personal SMS or WAP Service - visit http://mobilefriends.ch/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] perf pb solved only after pg_dump and restore

2006-08-28 Thread Markus Schaber
Hi, Guillaume,

Guillaume Cottenceau wrote:

 About REINDEX: is it ok to consider that REINDEX is to indexes
 what VACUUM FULL is to table data, because it cleans up unused
 index pages?

Yes, roughly speaking.

 And AFAICS you're not running it on a regular basis so your database
 was probably completely bloated which means:
 - bloated indexes,
 - bloated tables (ie a lot of fragmentation in the pages which means
 that you need far more pages to store the same data).
 
 I suppose that table fragmentation occurs when DELETE are
 interleaved with INSERT?

Yes, and it gets ugly as soon as the fsm setting is to low / VACUUM
frequency is to low, so it cannot keep up.

Big bunches of UPDATE/DELETE that hit more than, say 20% of the table
between VACUUM runs, justify a VACUUM FULL in most cases.

 VACUUM ANALYZE is normally run overnight (each night). Is it not
 regular enough? There can be hundreds of thousands of statements
 a day.

Which PostgreSQL version are you using? Maybe you should consider
autovacuum (which is a contrib module at least since 7.4, and included
in the server since 8.1). If you think that vacuum during working hours
puts too much load on your server, there are options to tweak that, at
least in 8.1.

Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] perf pb solved only after pg_dump and restore

2006-08-28 Thread Guillaume Cottenceau
Markus Schaber schabi 'at' logix-tt.com writes:

  VACUUM ANALYZE is normally run overnight (each night). Is it not
  regular enough? There can be hundreds of thousands of statements
  a day.
 
 Which PostgreSQL version are you using? Maybe you should consider
 autovacuum (which is a contrib module at least since 7.4, and included
 in the server since 8.1). If you think that vacuum during working hours
 puts too much load on your server, there are options to tweak that, at
 least in 8.1.

Ok, thanks. Unfortunately production insists on sticking on 7.4.5
for the moment :/

-- 
Guillaume Cottenceau
Create your personal SMS or WAP Service - visit http://mobilefriends.ch/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match