Re: [PERFORM] Backup strategies

2008-10-15 Thread Peter Childs
2008/10/15 Ivan Voras [EMAIL PROTECTED]:
 Matthew Wakeling wrote:
 On Wed, 15 Oct 2008, Ivan Voras wrote:
 Nope. Even files in data directory change. That's why the documentation
 warns against tools that emit errors for files that change during the
 copy.

 Ok, thanks. This is a bit off-topic, but if it's not how I imagine it,
 then how is it implemented?

 The files may change, but it doesn't matter, because there is enough
 information in the xlog to correct it all.

 I'm thinking about these paragraphs in the documentation:

 
 Be certain that your backup dump includes all of the files underneath
 the database cluster directory (e.g., /usr/local/pgsql/data). If you are
 using tablespaces that do not reside underneath this directory, be
 careful to include them as well (and be sure that your backup dump
 archives symbolic links as links, otherwise the restore will mess up
 your tablespaces).

 You can, however, omit from the backup dump the files within the
 pg_xlog/ subdirectory of the cluster directory. This slight complication
 is worthwhile because it reduces the risk of mistakes when restoring.
 This is easy to arrange if pg_xlog/ is a symbolic link pointing to
 someplace outside the cluster directory, which is a common setup anyway
 for performance reasons.
 

 So, pg_start_backup() freezes the data at the time it's called but still
 data and xlog are changed, in a different way that's safe to backup? Why
 not run with pg_start_backup() always enabled?


Because nothing would get vacuumed and your data would just grow and grow.

Your data is held at the point in time when you typed pg_start_backup
so when you restore your data is back at that point. If you need to go
forward you need the xlog. (hence point in time backup)

This is all part of the mvcc feature that PostgreSQL has.

PostgreSQL never delete anything until nothing can read it anymore, So
if you vacuum during a backup it will only delete stuff that was
finished with before the backup started.

If you don't do a pg_start_backup first you don't have this promise
that vacuum will not remove somthing you need. (Oh I think checkpoints
might come into this as well but I'm not sure how)

Or at least thats my understanding...

So if your base backup takes a while I would advise running vacuum
afterwards. But then if your running autovacuum there is probably very
little need to worry.

Peter Childs

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow Inserts on large tables

2008-10-03 Thread Peter Childs
I have a problem where by an insert on a large table will sometimes
take longer than usual.

Usually the inserts are quick then from time to time they will take a
long time sometimes as much as 10seconds or longer. (But usually under
500ms which is when I start logging them)

The queries are slow drip fed so bulk loading really is not an option,
Its logging data. Used in analysis and for historical purposes mostly.

I think the problem might have something to do with checkpoints, I'm
relatively sure its not when the table expands as I've run a vacuum
verbose straight away after a longer insert and not found loads of
space in the fsm.

I'm using 8.3.1 (I thought I'd upgraded to 8.3.3 but it does not look
like the upgrade worked) I'm more than happy to upgrade just have to
find the down time (even a few seconds can be difficult)

Any help would be appreciated.

Regards

Peter Childs

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Inserts on large tables

2008-10-03 Thread Peter Childs
2008/10/3 Peter Eisentraut [EMAIL PROTECTED]:
 Peter Childs wrote:

 I have a problem where by an insert on a large table will sometimes
 take longer than usual.

 I think the problem might have something to do with checkpoints,

 Then show us your checkpointing-related parameters.  Or try to set them to a
 lot higher values so checkpoints happen more rarely and see if that makes a
 difference.



More often or less often?

I've currently got them set to

checkpoint_segments = 3
checkpoint_timeout = 180s
checkpoint_completion_target = 0.5

after reading that doing more smaller checkpoints might make each
checkpoint work quicker and hence less of a performance hit when they
actually happen.

Regards

Peter

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-28 Thread Peter Childs
2008/4/28 Gauri Kanekar [EMAIL PROTECTED]:

 All,

 We have a table table1 which get insert and updates daily in high
 numbers, bcoz of which its size is increasing and we have to vacuum it every
 alternate day. Vacuuming table1 take almost 30min and during that time the
 site is down.

 We need to cut down on this downtime.So thought of having a replication
 system, for which the replicated DB will be up during the master is getting
 vacuumed.

 Can anybody guide which will be the best suited replication solution for
 this.

 Thanx for any help
 ~ Gauri


I home your not using Vacuum Full... (Standard Reply for this type of
question)

What version of Postgresql are you using?

Have you tried autovacuum?

Run plain vacuum even more often on this even more often (like ever half
hour) and it should not take as long and save space.

If still have trouble run vacuum analyse verbose table1; and see what it
says.

If your doing it right you should be able to vacuum with the database up.

Sounds like you might be happier a fix for the problem rather than a complex
work around which will actually solve a completely different problem.

Regards

Peter.


Re: [PERFORM] Commit takes a long time.

2008-01-04 Thread Peter Childs
On 03/01/2008, Tom Lane [EMAIL PROTECTED] wrote:

 Peter Childs [EMAIL PROTECTED] writes:
  Using Postgresql 8.1.10 every so often I get a transaction that takes a
  while to commit.

  I log everything that takes over 500ms and quite reguallly it says
 things
  like

  707.036 ms statement: COMMIT

 AFAIK there are only two likely explanations for that:

 1. You have a lot of deferred triggers that have to run at COMMIT time.

 2. The disk system gets so bottlenecked that fsync'ing the commit record
 takes a long time.

 If it's #2 you could probably correlate the problem with spikes in I/O
 activity as seen in iostat or vmstat.

 If it is a disk usage spike then I would make the further guess that
 what causes it might be a Postgres checkpoint.  You might be able to
 dampen the spike a bit by playing with the checkpoint parameters, but
 the only real fix will be 8.3's spread-out-checkpoints feature.

 regards, tom lane



2 Seams most likely  as they seam to occur more often when other when large
queries (they are often followed by a record for a very very long query in a
deferent transaction) or at particularly busy period when quite a lots of
other short queries are also taking place.

I planning an upgrade to 8.3 once its out anyway so that might increase
speed anyway.

Peter.


[PERFORM] Commit takes a long time.

2008-01-03 Thread Peter Childs
Using Postgresql 8.1.10 every so often I get a transaction that takes a
while to commit.

I log everything that takes over 500ms and quite reguallly it says things
like

707.036 ms statement: COMMIT

Is there anyway to speed this up?

Peter Childs


Re: [PERFORM] doubt with pg_dump and high concurrent used databases

2007-11-26 Thread Peter Childs
On 25/11/2007, Pablo Alcaraz [EMAIL PROTECTED] wrote:

 Tom Lane wrote:
  Peter Childs [EMAIL PROTECTED] writes:
 
  On 25/11/2007, Erik Jones [EMAIL PROTECTED] wrote:
 
  Does the pg_dump create this kind of consistent backups? Or do I
  need to do the backups using another program?
 
  Yes, that is exactly what pg_dump does.
 
 
  Yes so long as you are using transactions correctly. Ie doing a begin
 before
  each invoice and a commit afterwards if your not bothering and using
 auto
  commit you *may* have problems.
 
 
  I think you need to qualify that a bit more.  What you're saying is that
  if an application has consistency requirements that are momentarily
  violated during multi-statement updates, and it fails to wrap such
  updates into a single transaction, then pg_dump could capture one of the
  intermediate states.  That's true, but it's hardly pg_dump's fault.
  If there were a system crash partway through such a sequence, the
  consistency requirements would be violated afterwards, too.
 
 

 Agree. In my case I define consistent database state like the state
 the database has when the program that use it is stopped normally and
 without errors. In this state the program starts without troubles and
 everything looks fine. I believe this behavior is because all the
 inserts and updates are made using transactions. Another things will be
 a bug, it ll be fixed and it ll not be pg_dump fault.

 So if pg_dump can capture a consistent state with all the data until
 the start time, without all the pending open transaction updates/inserts
 in the same way that I did when I stopped the program before start
 pg_dump, for me is usefull and enough to solve my problem.

 Thanks to all!

 Pablo


Given your long description over what you though was constant I thought it
important that the answer yes but was given rather than just a plain yes.
I've met quite a few apps that create inconstant databases when  the
database its self is actually consistent.

Peter


Re: [PERFORM] doubt with pg_dump and high concurrent used databases

2007-11-25 Thread Peter Childs
On 25/11/2007, Erik Jones [EMAIL PROTECTED] wrote:

 On Nov 25, 2007, at 10:46 AM, Pablo Alcaraz wrote:

  Hi all,
 
  I read that pg_dump can run while the database is being used and makes
  consistent backups.
 
  I have a huge and *heavy* selected, inserted and updated database.
  Currently I have a cron task that disconnect the database users,
  make a
  backup using pg_dump and put the database online again. The problem
  is,
  now there are too much information and everyday the database store
  more
  and more data, the backup process needs more and more time to run
  and I
  am thinking about to do the backup using a process that let me to
  do it
  with the minimal interruptions for the users.
 
  I do not need a last second backup. I could the a backup with almost
  all the data but I need the information on it to be coherent. For
  example, if the backup store information about an invoice it *must* to
  store both header and items invoice information. I could live if the
  backup does not store some invoices information when is ran, because
  they ll be backuped the next time the backup process run. But I can
  not
  store only a part of the invoices. That is I call a coherent backup.
 
  The best for me is that the cron tab does a concurrent backup with all
  the information until the time it starts to run while the clients are
  using the database. Example: if the cron launch the backup process at
  12:30 AM, the backup moust be builded with all the information *until*
  12:30AM. So if I need to restore it I get a database coherent with the
  same information like it was at 12:30AM. it does not matter if the
  process needs 4 hours to run.
 
  Does the pg_dump create this kind of consistent backups? Or do I
  need
  to do the backups using another program?

 Yes, that is exactly what pg_dump does.


Yes so long as you are using transactions correctly. Ie doing a begin before
each invoice and a commit afterwards if your not bothering and using auto
commit you *may* have problems. pg_dump will show a constant state at the
time when the backup was started. If your database was not consistent  at
that time you may have issues, But it will be constant  from a  database
point of view ie foreign keys, primary keys, check constraints, triggers
etc.

It all depends what you mean by consistent.

Peter.


Re: [PERFORM] Long Running Commits - Not Checkpoints

2007-09-14 Thread Peter Childs
On 13/09/2007, Greg Smith [EMAIL PROTECTED] wrote:


 Every time the all scan writes a buffer that is frequently used, that
 write has a good chance that it was wasted because the block will be
 modified again before checkpoint time.  Your settings are beyond regular
 aggressive and into the hyperactive terrority where I'd expect such
 redundant writes are happening often.  I'd suggest you try to move toward
 dropping bgwriter_all_percent dramatically from its current setting and
 see how far down you can go before it starts to introduce blocks at
 checkpoint time.  With bgwriter_delay set to 1/4 the default, I would
 expect that even 5% would be a high setting for you.  That may be a more
 dramatic change than you want to make at once though, so lowering it in
 that direction more slowly (perhaps drop 5% each day) and seeing whether
 things improve as that happens may make more sense.


Are you suggesting that reducing bgwriter_delay and bg_writer_percent would
reduce the time spent doing commits?

I get quite a few commits that take over 500ms (the point when i start
logging queries). I always thought oh just one of those things but if they
can be reduced by changing a few config variables that would be great. I'm
just trying to workout what figures are worth trying to see if I can reduce
them.

From time to time I get commits that take 6 or 7 seconds but not all the
time.

I'm currently working with the defaults.

Peter Childs


Re: [PERFORM] Long Running Commits - Not Checkpoints

2007-09-14 Thread Peter Childs
On 14/09/2007, Peter Childs [EMAIL PROTECTED] wrote:



 On 13/09/2007, Greg Smith [EMAIL PROTECTED] wrote:
 
 
  Every time the all scan writes a buffer that is frequently used, that
  write has a good chance that it was wasted because the block will be
  modified again before checkpoint time.  Your settings are beyond regular
 
  aggressive and into the hyperactive terrority where I'd expect such
  redundant writes are happening often.  I'd suggest you try to move
  toward
  dropping bgwriter_all_percent dramatically from its current setting and
  see how far down you can go before it starts to introduce blocks at
  checkpoint time.  With bgwriter_delay set to 1/4 the default, I would
  expect that even 5% would be a high setting for you.  That may be a more
  dramatic change than you want to make at once though, so lowering it in
  that direction more slowly (perhaps drop 5% each day) and seeing whether
  things improve as that happens may make more sense.
 
 
 Are you suggesting that reducing bgwriter_delay and bg_writer_percent
 would reduce the time spent doing commits?

 I get quite a few commits that take over 500ms (the point when i start
 logging queries). I always thought oh just one of those things but if they
 can be reduced by changing a few config variables that would be great. I'm
 just trying to workout what figures are worth trying to see if I can reduce
 them.

 From time to time I get commits that take 6 or 7 seconds but not all the
 time.

 I'm currently working with the defaults.

 Peter Childs


Hmm Always read the manual, Increase them from the defaults...

Peter.


Re: [PERFORM] optimize query with a maximum(date) extraction

2007-09-05 Thread Peter Childs
On 05/09/07, Gregory Stark [EMAIL PROTECTED] wrote:

 Gregory Stark [EMAIL PROTECTED] writes:

  JS Ubei [EMAIL PROTECTED] writes:
 
  I need to improve a query like :
 
  SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id;
 ...
  I don't think you'll find anything much faster for this particular
 query. You
  could profile running these two (non-standard) queries:
 
  SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY
 id, the_date ASC
  SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY
 id, the_date DESC

 Something else you might try:

 select id,
(select min(the_date) from my_table where id=x.id) as min_date,
(select max(the_date) from my_table where id=x.id) as max_date
   from (select distinct id from my_table)

 Recent versions of Postgres do know how to use the index for a simple
 ungrouped min() or max() like these subqueries.

 This would be even better if you have a better source for the list of
 distinct
 ids you're interested in than my_table. If you have a source that just has
 one
 record for each id then you won't need an extra step to eliminate
 duplicates.


My personal reaction is why are you using distinct at all?

why not

select id,
   min(the_date) as min_date,
   max(the_date) as max_date
  from my_table group by id;

Since 8.0 or was it earlier this will use an index should a reasonable one
exist.

Peter.


Re: [PERFORM] setting up raid10 with more than 4 drives

2007-05-30 Thread Peter Childs

On 30/05/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:


On Wed, 30 May 2007, Jonah H. Harris wrote:

 On 5/29/07, Luke Lonergan [EMAIL PROTECTED] wrote:
  AFAIK you can't RAID1 more than two drives, so the above doesn't make
  sense
  to me.

 Yeah, I've never seen a way to RAID-1 more than 2 drives either.  It
 would have to be his first one:

 D1 + D2 = MD0 (RAID 1)
 D3 + D4 = MD1 ...
 D5 + D6 = MD2 ...
 MD0 + MD1 + MD2 = MDF (RAID 0)


I don't know what the failure mode ends up being, but on linux I had no
problems creating what appears to be a massively redundant (but small)
array

md0 : active raid1 sdo1[10](S) sdn1[8] sdm1[7] sdl1[6] sdk1[5] sdj1[4]
sdi1[3] sdh1[2] sdg1[9] sdf1[1] sde1[11](S) sdd1[0]
   896 blocks [10/10] [UU]

David Lang



Good point, also if you had Raid 1 with 3 drives with some bit errors at
least you can take a vote on whats right. Where as if you only have 2 and
they disagree how do you know which is right other than pick one and hope...
But whatever it will be slower to keep in sync on a heavy write system.

Peter.


Re: [PERFORM] Key/Value reference table generation: INSERT/UPDATE performance

2007-05-22 Thread Peter Childs

On 22 May 2007 01:23:03 -0700, valgog [EMAIL PROTECTED] wrote:


I found several post about INSERT/UPDATE performance in this group,
but actually it was not really what I am searching an answer for...

I have a simple reference table WORD_COUNTS that contains the count of
words that appear in a word array storage in another table.

CREATE TABLE WORD_COUNTS
(
  word text NOT NULL,
  count integer,
  CONSTRAINT PK_WORD_COUNTS PRIMARY KEY (word)
)
WITHOUT OIDS;




Is there any reason why count is not not null? (That should siplify your
code by removing the coalesce)

insert is more efficient than update because update is always a delete
followed by an insert.

Oh and group by is nearly always quicker than distinct and can always? be
rewritten as such. I'm not 100% sure why its different but it is.

Peter.



I have some PL/pgSQL code in a stored procedure like


  FOR r
   IN select id, array_of_words
from word_storage
  LOOP
begin
  -- insert the missing words
  insert into WORD_COUNTS
  ( word, count )
  ( select word, 0
  from ( select distinct (r.array_of_words)
[s.index] as d_word
   from generate_series(1,
array_upper( r.array_of_words, 1 ) ) as s(index) ) as distinct_words
 where word not in ( select d_word from
WORD_COUNTS  ) );
  -- update the counts
  update WORD_COUNTS
 set count = COALESCE( count, 0 ) + 1
   where word in ( select distinct (r.array_of_words)[s.index] as
word
from generate_series(1,
array_upper( r.array_of_words, 1) ) as s(index) );
exception when others then
  error_count := error_count + 1;
end;
record_count := record_count + 1;
  END LOOP;

This code runs extremely slowly. It takes about 10 minutes to process
1 records and the word storage has more then 2 million records to
be processed.

Does anybody have a know-how about populating of such a reference
tables and what can be optimized in this situation.

Maybe the generate_series() procedure to unnest the array is the place
where I loose the performance?

Are the set update/inserts more effitient, then single inserts/updates
run in smaller loops?

Thanks for your help,

Valentine Gogichashvili


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly



Re: [PERFORM] Vacuumdb - Max_FSM_Pages Problem.

2007-02-26 Thread Peter Childs

On 26/02/07, Pallav Kalva [EMAIL PROTECTED] wrote:

Hi,

 I am in the process of cleaning up one of our big table, this table
has 187 million records and we need to delete around 100 million of them.

 I am deleting around 4-5 million of them daily in order to catchup
with vacuum and also with the archive logs space. So far I have deleted
around 15million in past few days.

 max_fsm_pages value is set to 120. Vacuumdb runs once daily,
here is the output from last night's vacuum job


===
 INFO:  free space map: 999 relations, 798572 pages stored; 755424
total pages needed
 DETAIL:  Allocated FSM size: 1000 relations + 120 pages = 7096
kB shared memory.
 VACUUM



 From the output it says 755424 total pages needed , this number
keeps growing daily even after vacuums are done daily. This was around
350K pages before the delete process started.

 I am afraid that this number will reach the max_fsm_pages limit
soon and vacuums thereafter will never catch up .

 Can anyone please explain this behavior ? What should I do to catch
up with vacuumdb daily ?



Vacuum adds to free pages to the fsm so that they can be reused. If
you don't fill up those free pages the fsm will fill up. Once the fsm
is full no more pages can be added to the fsm. If you start writing to
the free pages via inserts when vacuum next runs more free pages will
be added that did not fit previously in the free space map due to it
being full.

If you are really deleting that many records you may be better coping
those you want to a new table and dropping the old one. To actually
recover space you need to either run vacuum full or cluster.

This ought to be in the manual somewhere as this question gets asked
about once a week.

Peter.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Planner statistics, correlations

2007-01-12 Thread Peter Childs

On 12/01/07, Tobias Brox [EMAIL PROTECTED] wrote:

We have a table with a timestamp attribute (event_time) and a state flag
which usually changes value around the event_time (it goes to 4).  Now
we have more than two years of events in the database, and around 5k of
future events.

It is important to frequently pick out overdue events, say:

  select * from events where state4 and event_timenow()

This query would usually yield between 0 and 100 rows - however, the
planner doesn't see the correlation betewen state and event_time - since
most of the events have event_timenow, the planner also assumes most of
the events with state4 has event_timenow, so the expected number of
rows is closer to 5k.  This matters, because I have a query with joins,
and I would really benefit from nested loops.

(I've tried replacing now() above with different timestamps from the
future and the past.  I'm using pg 8.2)

Any suggestions?



Can you say what state might be rather than what it is not. I'm guess
that state is an int but there is only a limited list of possible
states, if you can say what it might be rather than what it is the
index is more liklly to be used.

Peter.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] GROUP BY vs DISTINCT

2006-12-20 Thread Peter Childs

On 20/12/06, Steinar H. Gunderson [EMAIL PROTECTED] wrote:

On Tue, Dec 19, 2006 at 11:19:39PM -0800, Brian Herlihy wrote:
 Actually, I think I answered my own question already.  But I want to
 confirm - Is the GROUP BY faster because it doesn't have to sort results,
 whereas DISTINCT must produce sorted results?  This wasn't clear to me from
 the documentation.  If it's true, then I could save considerable time by
 using GROUP BY where I have been using DISTINCT in the past.  Usually I
 simply want a count of the distinct values, and there is no need to sort
 for that.

You are right; at the moment, GROUP BY is more intelligent than DISTINCT,
even if they have to compare the same columns. This is, as always, something
that could be improved in a future release, TTBOMK.

/* Steinar */


Oh so thats why group by is nearly always quicker than distinct. I
always thought distinct was just short hand for group by same columns
as I've just selected
Is it actually in the sql spec to sort in a distinct or could we just
get the parser to rewrite distinct into group by and hence remove the
extra code a different way of doing it must mean.?

Peter.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] Massive delete of rows, how to proceed?

2006-11-25 Thread Peter Childs

On 24/11/06, Arnau [EMAIL PROTECTED] wrote:

Hi all,

   I have a table with statistics with more than 15 million rows. I'd
like to delete the oldest statistics and this can be about 7 million
rows. Which method would you recommend me to do this? I'd be also
interested in calculate some kind of statistics about these deleted
rows, like how many rows have been deleted for date. I was thinking in
creating a function, any recommendations?



Copy and drop old table. If you delete you will have a massive problem
with a bloated table and  vacuum will not help unless you expect the
table to grow to this size regulally otherwise vacuum full will take
ages.

Peter.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Identifying bloated tables

2006-08-29 Thread Peter Childs

On 28/08/06, Michal Taborsky - Internet Mall [EMAIL PROTECTED] wrote:

Markus Schaber napsal(a):
 Hi, Michal,

 Michal Taborsky - Internet Mall wrote:

 When using this view, you are interested in tables, which have the
 bloat column higher that say 2.0 (in freshly dump/restored/analyzed
 database they should all be around 1.0).

 I just noticed some columns in pg_catalog with a bloat value 1 and a
 negative wasted space - is this due to the pseudo nature of them?

It is more likely due to the fact, that these numbers are just
estimates, based on collected table statistics, so for small or
non-standard tables the statistical error is greater that the actual
value. You are usually not interested in tables, which have wasted space
of 1000kB or -1000kB. Also the database must be ANALYZEd properly for
these numbers to carry any significance.



I was just playing around with this table and noticed it preforms the
badly in tables with very small record sizes. This seams to be because
it ignores the system overhead (oid, xmin ctid etc) which seams to be
about 28 bytes per a record this can be quite significate in small
record tables and can cause trouble even with a smal numbers of
record.  Hence I've got a table thats static and fresly vacuum full
which reads with a bloat of 4.

Easy to recreate problem to

Create table regionpostcode (area varchar(4), regionid int);

then insert 12 records.

Peter.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Large Table With Only a Few Rows

2006-02-27 Thread Peter Childs
On 27/02/06, Chris Browne [EMAIL PROTECTED] wrote:
Nik [EMAIL PROTECTED] writes: I have a table that has only a few records in it at the time, and they get deleted every few seconds and new records are inserted. Table never
 has more than 5-10 records in it. However, I noticed a deteriorating performance in deletes and inserts on it. So I performed vacuum analyze on it three times (twice in a row, and once two days later). In the statistics it says that the table size
 is 863Mb, toast table size is 246Mb, and indexes size is 134Mb, even though the table has only 5-10 rows in it it. I was wondering how can I reclaim all this space and improve the performance?
You need to run VACUUM ANALYZE on this table very frequently.Based on what you describe, very frequently should be on the orderof at least once per minute.Schedule a cron job specifically to vacuum this table, with a cron
entry like the following:* * * * * /usr/local/bin/vacuumdb -z -t my_table -p 5432 my_databaseOf course, you need to bring it back down to size, first.You could run CLUSTER on the table to bring it back down to size;
that's probably the fastest way... cluster my_table_pk on my_table;VACUUM FULL would also do the job, but probably not as quickly.--(reverse (concatenate 'string gro.gultn @ enworbbc))
http://cbbrowne.com/info/sgml.htmlNow they can put you in jail if they *THINK* you're gonna commit acrime.Let me say that again, because it sounds vaguely important
--george carlin---(end of broadcast)---TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not
 match
You probably want to do one or two other things.

1 Switch on autovacuum.

2 improve the setting of max_fsm_pages in your postgresql.conf a restart will be required.

if you do a vacuum verbose; the last couple of lines should tell you
how much free space is about against how much free space the database
can actuall remember to use. 

INFO: free space map contains 5464 pages in 303 relations
DETAIL: A total of 9760 page slots are in use (including overhead).
9760 page slots are required to track all free space.
Current limits are: 4 page slots, 1000 relations, using 299 KB.

if the required page slots (9760 in my case) goes above the current
limit (4 in my case) you will need to do a vacuum full to reclaim
the free space. (cluster of the relevent tables may work.

If you run Vacuum Verbose regullally you can check you are vacuuming
often enough and that your free space map is big enough to hold your
free space.

Peter Childs
 


Re: [PERFORM] Postgres slower than MS ACCESS

2006-02-16 Thread Peter Childs
On 15/02/06, Jay Greenfield [EMAIL PROTECTED] wrote:
I've been vacuuming between each test run.Not vacuuming results in times all the way up to 121 minutes.For a directcomparison with Access, the vacuuming time with Postgres should really beincluded as this is not required with Access.


Hmm but then you would have to include Access Vacuum too I'll think you
will find Tools - Database Utils - Compact Database preforms
a simular purpose and is just as important as I've seen many Access
Databases bloat in my time.

Peter Childs 



Re: [PERFORM] Performance, vacuum and reclaiming space, fsm

2003-10-13 Thread Peter Childs
On Mon, 13 Oct 2003, Seum-Lim Gan wrote:

 Hi,
 
 I did a search in the discussion lists and found several
 pointers about setting the max_fsm_relations and pages.
 
 I have a table that keeps being updated and noticed
 that after a few days, the disk usage has growned to
 from just over 150 MB to like 2 GB !
 
 I followed the recommendations from the various search
 of the archives, changed the max_fsm_relations, pages,
 keep doing vacuum like every minute while the
 table of interest in being updated. I kept
 watching the disk space usage and still noticed that
 it continues to increase.
 
 Looks like vacuum has no effect.
 
 I did vacuum tablename and don't intend to use
 the full option since it locks the table.
 
 I have 7.3.3 running in Solaris 9.
 
 Any recommendation ?
 
 Thanks.
 
 Gan
 

Try auto_vacuum (its in the 7.4beta4 contrib directory) I find it 
very useful. Often you find that every minute in fact can be a little too 
often. My table updates every couple of seconds but is vacuumed 
(automatically) every hmm hour. 
If you have lots of overlapping vacumms and or editing connections 
records may be held on to by one vacuum so the next can't do its job. 
Always ensure that there is only one vacuum process. (You can't do this 
easily with cron!) 
I'm still using 7.3.2. 7.3.3 is sposed to have some big bugs and 
7.3.4 was produced within 24 hours.(must upgrade at some point)
Oh yes Index have problems (I think this is fix in later 
versions...) so you might want to try reindex. 

They are all worth a try its a brief summary of what been on 
preform for weeks and weeks now.

Peter Childs


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Speeding up Aggregates

2003-10-10 Thread Peter Childs
On Thu, 9 Oct 2003, Dror Matalon wrote:

 On Thu, Oct 09, 2003 at 07:07:00PM -0400, Greg Stark wrote:
  Dror Matalon [EMAIL PROTECTED] writes:
  
   Actually what finally sovled the problem is repeating the 
   dtstamp  last_viewed
   in the sub select
  
  That will at least convince the optimizer to use an index range lookup. But it
  still will have to scan every record that matches channel==$1, link==$2, and
  dtstamp$3.
  
  The trick of using limit 1 will be faster still as it only has to retrieve a
  single record using the index. But you have to be sure to convince it to use
 
 How is doing order by limit 1 faster than doing max()? Seems like the
 optimizer will need to sort or scan the data set either way. That part
 didn't actually make a difference in my specific case.


max(field) = sequential scan looking for the hightest.

order by field desc limit 1 = index scan (if available), read first 
record. 
else (if no index) sequential scan for highest.

aggregates don't use indexes because its only appilicable for
max() and min() and can't be done for sum(), count(), etc writing an
alogorithim to use the index would be complex as you would need to tell
the optimized from the inside a function (you can write aggrate functions
your self if you wish) to do somthing slighly differently.

for my large table
select max(field) from table; (5264.21 msec)
select field from table order by field limit 1; (54.88 msec)

Peter Childs


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] way to speed up a SELECT DISTINCT?

2003-10-10 Thread Peter Childs
On Thu, 9 Oct 2003, Seth Ladd wrote:

 Hello,
 
 I am running 7.3.2 RPMs on RH9, on a celeron 1.7 w/ 1gig ram.
 
 I have a table that has 6.9 million rows, 2 columns, and an index on 
 each column.  When I run:
 
 SELECT DISTINCT column1 FROM table
 
 It is very, very slow (10-15 min to complete).  An EXPLAIN shows no 
 indexes are being used.
 
 Is there any way to speed this up, or is that DISTINCT going to keep 
 hounding me?
 
 I checked the mailing list, and didn't see anything like this.
 
 Any tips or hints would be greatly appreciated.  Thanks for your help!
 Seth
 
 
Try group by instead. I think this is an old bug its fixed in 
7.3.2 which I'm using.

Peter Childs
`


[EMAIL PROTECTED]:express=# explain select distinct region from region;
  QUERY PLAN
--
 Unique  (cost=0.00..4326.95 rows=9518 width=14)
   -  Index Scan using regionview_region on region  (cost=0.00..4089.00 
rows=95183 width=14)
(2 rows)

[EMAIL PROTECTED]:express=# explain select distinct region from region group 
by region;
 QUERY PLAN

 Unique  (cost=0.00..4350.75 rows=952 width=14)
   -  Group  (cost=0.00..4326.95 rows=9518 width=14)
 -  Index Scan using regionview_region on region  
(cost=0.00..4089.00 rows=95183 width=14)
(3 rows)




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] way to speed up a SELECT DISTINCT?

2003-10-10 Thread Peter Childs
On Fri, 10 Oct 2003, Seth Ladd wrote:

  Is there any way to speed this up, or is that DISTINCT going to keep
  hounding me?
 
  I checked the mailing list, and didn't see anything like this.
 
  Any tips or hints would be greatly appreciated.  Thanks for your help!
  Seth
 
 
  Try group by instead. I think this is an old bug its fixed in
  7.3.2 which I'm using.
 
  Peter Childs
  `
 
 
  [EMAIL PROTECTED]:express=# explain select distinct region from region;
QUERY PLAN
  --- 
  ---
   Unique  (cost=0.00..4326.95 rows=9518 width=14)
 -  Index Scan using regionview_region on region   
  (cost=0.00..4089.00
  rows=95183 width=14)
  (2 rows)
 
 Thanks for the tip, I'll give this a shot soon.  I am curious, your  
 example above does not use GROUP BY yet you have an INDEX SCAN.  I am  
 using a similar query, yet I get a full table scan.  I wonder how they  
 are different?
 
 I'll try the group by anyway.
 
Its a guess but ANALYSE might help. `

Peter Childs


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Some vacuum tuning help

2003-08-06 Thread Peter Childs
On Tue, 5 Aug 2003, Shridhar Daithankar wrote:

 On 5 Aug 2003 at 8:09, Jeff wrote:
 
 I would suggest autovacuum daemon which is in CVS contrib  works for 7.3.x as 
 well.. Or schedule a vacuum analyze every 15 minutes or so..

I've just got autovacum up and Since we have had a lot of talk 
about it recently. I thought some feed back might be useful.
It seams to work quite well. But can be rather zelous on its 
analysing for the first few hours. Curretly its analysig static (ie 
nothigs changed) tables every 10minites. Vacuums seam to be about right.
I think that many vacuums may be slowing does my database

Peter Childs


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Optimization

2003-07-29 Thread Peter Childs
On Tue, 29 Jul 2003, Shridhar Daithankar wrote:

 On 28 Jul 2003 at 12:27, Josh Berkus wrote:
  Unless you're running PostgreSQL 7.1 or earlier, you should be VACUUMing every 
  10-15 minutes, not every 2-3 hours.   Regular VACUUM does not lock your 
  database.  You will also want to increase your FSM_relations so that VACUUM 
  is more effective/efficient; again, see the articles.
 
 There is an auto-vacuum daemon in contrib and if I understand it correctly, it 
 is not getting much of a field testing. How about you guys installing it and 
 trying it?

If there is such a daemon, what is it called? As I can't see it. 
Is it part of gborg?

Peter Childs


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])