[PERFORM] Vacum Analyze problem

2007-09-04 Thread rafael
Hello everyone:

   I wanted to ask you about how the VACUUM ANALYZE works. is it possible
that something can happen in order to reset its effects forcing to execute
the VACUUM ANALYZE comand again? i am asking this because i am struggling
with a query which works ok after i run a VACUUM ANALYZE, however, sudennly,
it starts to take forever (the execution of the query) until i make another
VACUUM ANALYZE, and so on ...
   I'd like to point that i am a novice when it comes to non basic
postgresql performance related stuff.

Thank you all in advance

Rafael



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Vacum Analyze problem

2007-09-04 Thread Mikko Partio
On 9/4/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 Hello everyone:

I wanted to ask you about how the VACUUM ANALYZE works. is it possible
 that something can happen in order to reset its effects forcing to execute
 the VACUUM ANALYZE comand again?



Yes, lots of modifications (INSERT,UPDATE,DELETE) to the table in question.

Regards

MP


Re: [PERFORM] Vacum Analyze problem

2007-09-04 Thread rafael
 On 9/4/07, [EMAIL PROTECTED] [EMAIL PROTECTED]
 wrote:

 Hello everyone:

I wanted to ask you about how the VACUUM ANALYZE works. is it
possible
 that something can happen in order to reset its effects forcing to
 execute the VACUUM ANALYZE comand again?



 Yes, lots of modifications (INSERT,UPDATE,DELETE) to the table in
 question.

 Regards

 MP

I knew that in the long run the VACUUM ANALYZE comand has to be executed
again. My question is if something can happen over night and cause the need
of a new VACUUM ANALYZE (regenerating indexes or other thing related with
performance).

Thanks for your reply.

Rafael



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Vacum Analyze problem

2007-09-04 Thread Bill Moran
In response to [EMAIL PROTECTED]:

 Hello everyone:
 
I wanted to ask you about how the VACUUM ANALYZE works. is it possible
 that something can happen in order to reset its effects forcing to execute
 the VACUUM ANALYZE comand again? i am asking this because i am struggling
 with a query which works ok after i run a VACUUM ANALYZE, however, sudennly,
 it starts to take forever (the execution of the query) until i make another
 VACUUM ANALYZE, and so on ...
I'd like to point that i am a novice when it comes to non basic
 postgresql performance related stuff.
 
 Thank you all in advance

To add to Mikko's comments:

Periodic vacuuming and analyzing is a mandatory part of running a
PostgreSQL database server.  You'll probably be best served to configure
the autovacuum daemon to handle this for you.  See the postgresql.conf
config file.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] About autovacuum

2007-09-04 Thread Jean-David Beyer
I have this turned on, and if I look at the log, it runs once a minute,
which is fine.

But what does it do? I.e, it runs VACUUM, but does it also do an analyze?

-- 
  .~.  Jean-David Beyer  Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A Registered Machine   241939.
 /( )\ Shrewsbury, New Jerseyhttp://counter.li.org
 ^^-^^ 14:35:01 up 26 days, 17:57, 2 users, load average: 4.31, 4.40, 4.78

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Vacum Analyze problem

2007-09-04 Thread rafael
 On Tuesday 04 September 2007 11:27:07 [EMAIL PROTECTED] wrote:
 Hello everyone:

I wanted to ask you about how the VACUUM ANALYZE works. is it
possible
 that something can happen in order to reset its effects forcing to
 execute the VACUUM ANALYZE comand again? i am asking this because i am
 struggling with a query which works ok after i run a VACUUM ANALYZE,
 however, sudennly, it starts to take forever (the execution of the
 query) until i make another VACUUM ANALYZE, and so on ...
I'd like to point that i am a novice when it comes to non basic
 postgresql performance related stuff.

 Thank you all in advance

 Rafael



 ---(end of
 broadcast)--- TIP 1: if posting/reading
 through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that
your message can get through to the mailing list cleanly

 Rafael;

 Vacuum Analyze performs 2 tasks at once.

 1) Vacuum - this analyzes the table pages and sets appropriate dead row
 space  (those from old updates or deletes that are not possibly needed
 by any  existing transactions) as such that the db can re-use
 (over-write) that  space.

 2) Analyze - Like an Oracle compute stats, updates the system catalogs
 with  current table stat data.

 The Vacuum will improve queries since the dead space can be re-used and
 any  dead space if the table you are having issues with is a high
 volume table  then the solution is generally to run vacuum more often -
 I've seen tables  that needed a vacuum every 5 minutes due to
 significant sustained churn.

 The Analyze of course is key for the planner, if the table is growing
 rapidly  then running analyze more often will help, if however there is
 lots of churn  but little change in the data (i.e. lots of inserts
 followed by delete's of  the same rows) then a straight vacuum is
 probably what you need.  If the data  is changing rapidly then bumping
 up the default_statistics_target value may  help - you can bump the
 default_statistics_target for a single table in the  pg_autovacuum
 system catalog table.

 Hope this helps...

 /Kevin


 ---(end of
 broadcast)--- TIP 3: Have you checked our
 extensive FAQ?

   http://www.postgresql.org/docs/faq

Thank you all for the information. I'll get to work on it and see what
happends.
Thanks again

Rafael









---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Vacum Analyze problem

2007-09-04 Thread Richard Broersma Jr
--- [EMAIL PROTECTED] wrote:
 Thank you all for the information. I'll get to work on it and see what
 happends.
 Thanks again
 
 Rafael

I'll chime in with one last thought about excellent resources on Vacuum:

http://www.postgresql.org/docs/8.2/static/sql-vacuum.html
http://www.postgresql.org/docs/8.2/static/sql-analyze.html
http://www.postgresql.org/docs/8.2/static/routine-vacuuming.html

Regards,
Richard Broersma Jr.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] About autovacuum

2007-09-04 Thread Bill Moran
In response to Jean-David Beyer [EMAIL PROTECTED]:

 I have this turned on, and if I look at the log, it runs once a minute,
 which is fine.
 
 But what does it do? I.e, it runs VACUUM, but does it also do an analyze?

Yes.  If you turn up the debugging level, you'll see detailed log
messages about its activities.

There were discussions on other lists about improving autovacuum's log
messages, I'm pretty sure it will log more helpful information in 8.3.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] join tables vs. denormalization by trigger

2007-09-04 Thread Mark Lewis
On Tue, 2007-09-04 at 20:53 +0200, Walter Mauritz wrote:
 Hi,
 
 I wonder about differences in performance between two scenarios:
 
 Background:
 Table A, ~50,000 records
 Table B, ~3,000,000 records (~20 cols)
 Table C, ~30,000,000 records (~10 cols)
 
 a query every 3sec. with limit 10
 
 Table C depends on Table B wich depends on Table A, int8 foreign key, btree 
 index
 
 * consider it a read only scenario (load data only in night, with time for 
 vacuum analyze daily)
 * im required to show records from Table C, but also with some (~5cols) info 
 from Table B 
 * where clause always contains the foreign key to Table A
 * where clause may contain further 1-10 search parameter
 
 
 Scenario A)
 simply inner join Table B + C
 
 Scenario B)
 with use of trigger on insert/update I could push the required information 
 from table B down to table C.
 - so i would only require to select from table C.
 
 
 My question:
 1) From your experience ... how much faster (approximately) in percent do you 
 regard Scenario B faster than A ?

You're assuming that B is always going to be faster than A, which
certainly isn't a foregone conclusion.  Let's say that you average 10
bytes per column.  In scenario A, the total data size is then roughly
3,000,000 * 20 * 10 + 30,000,000 * 10 * 10 = 3.6 GiB.  In scenario B due
to your denormalization, the total data size is more like 30,000,000 *
30 * 10 = 9 GiB, or 2.5 times more raw data.

That's a lot of extra disk IO, unless your database will always fit in
memory in both scenarios.

Although you didn't provide enough data to answer with certainty, I
would go on the assumption that A is going to be faster than B.  But
even if it weren't, remember that premature optimization is the root of
all evil.  If you try A and it doesn't perform fast enough, then you can
always try B later to see if it works any better.

-- Mark Lewis

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-04 Thread Carlo Stonebanks
A client is moving their postgresql db to a brand new Windows 2003 x64 
server with 2 quad cores and 32GB of RAM. It is a dedicated server to run 
8.2.4.


The server typically will have less than 10 users. The primary use of this 
server is to host a database that is continuously being updated by data 
consolidation and matching software software that hits the server very hard. 
There are typically eight such processes running at any one time. The 
software extensively exploits postgresql native fuzzy string for data 
matching. The SQL is dynamically generated by the software and consists of 
large, complex joins. (the structure of the joins change as the software 
adapts its matching strategies).


I would like to favour the needs of the data matching software, and the 
server is almost exclusivly dedicated to PostgreSQL.


I have made some tentative modifications to the default postgres.config file 
(see below), but I don't think I've scratched the surface of what this new 
system is capable of. Can I ask - given my client's needs and this new, 
powerful server and the fact that the server typically has a small number of 
extremely busy processes, what numbers they would change, and what the 
recommendations would be?


Thanks!

Carlo

max_connections = 100
shared_buffers = 10
work_mem = 100
max_fsm_pages = 204800
max_fsm_relations = 1500
vacuum_cost_delay = 40
bgwriter_lru_maxpages = 100
bgwriter_all_maxpages = 100
checkpoint_segments = 64
checkpoint_warning = 290
effective_cache_size = 375000
stats_command_string = on
stats_start_collector = on
stats_row_level = on
autovacuum = on
autovacuum_naptime = 1min
autovacuum_vacuum_threshold = 500
autovacuum_analyze_threshold = 250
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] join tables vs. denormalization by trigger

2007-09-04 Thread Marc Mamin
Hello,

I had a similar issue and -atfer testing - decided to merge the tables
B and C into a single table.
In my case the resulting table contains a large proportion of nulls
which limits the size increase...
You'll have to do some testing with your data to evaluate the
performance gain.

Hope to help,

Marc  

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Walter
Mauritz
Sent: Tuesday, September 04, 2007 8:53 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] join tables vs. denormalization by trigger

Hi,

I wonder about differences in performance between two scenarios:

Background:
Table A, ~50,000 records
Table B, ~3,000,000 records (~20 cols)
Table C, ~30,000,000 records (~10 cols)

a query every 3sec. with limit 10

Table C depends on Table B wich depends on Table A, int8 foreign key,
btree index

* consider it a read only scenario (load data only in night, with time
for vacuum analyze daily)
* im required to show records from Table C, but also with some (~5cols)
info from Table B
* where clause always contains the foreign key to Table A
* where clause may contain further 1-10 search parameter


Scenario A)
simply inner join Table B + C

Scenario B)
with use of trigger on insert/update I could push the required
information from table B down to table C.
- so i would only require to select from table C.


My question:
1) From your experience ... how much faster (approximately) in percent
do you regard Scenario B faster than A ?

2) any other tips for such a read only scenario

Thx for any attention :-)
Walter
-- 
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-04 Thread Scott Marlowe
On 9/4/07, Carlo Stonebanks [EMAIL PROTECTED] wrote:
 A client is moving their postgresql db to a brand new Windows 2003 x64
 server with 2 quad cores and 32GB of RAM. It is a dedicated server to run
 8.2.4.

And what does the drive subsystem look like?  All that horsepower
isn't going to help if all your data is sitting on an inferior drive
subsystem.

 The server typically will have less than 10 users. The primary use of this
 server is to host a database that is continuously being updated by data
 consolidation and matching software software that hits the server very hard.
 There are typically eight such processes running at any one time. The
 software extensively exploits postgresql native fuzzy string for data
 matching. The SQL is dynamically generated by the software and consists of
 large, complex joins. (the structure of the joins change as the software
 adapts its matching strategies).

 I would like to favour the needs of the data matching software, and the
 server is almost exclusivly dedicated to PostgreSQL.

 I have made some tentative modifications to the default postgres.config file
 (see below), but I don't think I've scratched the surface of what this new
 system is capable of. Can I ask - given my client's needs and this new,
 powerful server and the fact that the server typically has a small number of
 extremely busy processes, what numbers they would change, and what the
 recommendations would be?

 Thanks!

 Carlo

 max_connections = 100
 shared_buffers = 10
 work_mem = 100

Even with only 10 users, 1 gig work_mem is extremely high.  (without a
unit, work_mem is set in k on 8.2.x) 1 would be much more
reasonable.

OTOH, shared_buffers, at 10 is only setting it to 100 meg.  that's
pretty small on a machine with 32 gig.  Also, I recommend setting
values more readable, like 500MB in postgresql.conf.  Much easier to
read than 10...

 effective_cache_size = 375000

This seems low by an order of magnitude or two.

But the most important thing is what you've left out.  What kind of
I/O does this machine have.  It's really important for something that
sounds like an OLAP server.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-04 Thread Alvaro Herrera
Carlo Stonebanks wrote:
 A client is moving their postgresql db to a brand new Windows 2003 x64 
 server with 2 quad cores and 32GB of RAM. It is a dedicated server to run 
 8.2.4.

Large shared_buffers and Windows do not mix.  Perhaps you should leave
the shmem config low, so that the kernel can cache the file pages.

 The server typically will have less than 10 users. The primary use of this 
 server is to host a database that is continuously being updated by data 
 consolidation and matching software software that hits the server very 
 hard. There are typically eight such processes running at any one time. The 
 software extensively exploits postgresql native fuzzy string for data 
 matching. The SQL is dynamically generated by the software and consists of 
 large, complex joins. (the structure of the joins change as the software 
 adapts its matching strategies).

It sounds like you will need a huge lot of vacuuming effort to keep up.
Maybe you should lower autovac scale factors so that your tables are
visited more frequently.  A vacuum_delay of 40 sounds like too much
though.

Since you didn't describe your disk configuration, it is most likely not
really prepared to handle high I/O load.  Maybe you should fix that.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-04 Thread Scott Marlowe
On 9/4/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
 Carlo Stonebanks wrote:
  A client is moving their postgresql db to a brand new Windows 2003 x64
  server with 2 quad cores and 32GB of RAM. It is a dedicated server to run
  8.2.4.

 Large shared_buffers and Windows do not mix.  Perhaps you should leave
 the shmem config low, so that the kernel can cache the file pages.

Egads, I'd completely missed the word Windows up there.

I would highly recommend building the postgresql server on a unixish
OS.  Even with minimum tuning, I'd expect the same box running linux
or freebsd to stomp windows pretty heavily in the performance
department.

But yeah, the I/O, that's the big one.  If it's just a single or a
couple of IDE drives, it's not gonna be able to handle much load.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Vacum Analyze problem

2007-09-04 Thread Gregory Stark
[EMAIL PROTECTED] writes:

 I knew that in the long run the VACUUM ANALYZE comand has to be executed
 again. My question is if something can happen over night and cause the need
 of a new VACUUM ANALYZE (regenerating indexes or other thing related with
 performance).

The answer to your question is possibly yes for two reasons:

1) If you're running an autovacuum daemon it might decide it's time to vacuum
the table and kick off a vacuum. That sounds most like what you're describing.

2) If the size of the table changes substantially becoming much larger (or
smaller but that wouldn't happen just due to deletes unless you run vacuum)
then recent versions of Postgres will notice even if you don't run analyze and
take that into account.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend