Re: [PERFORM] Performance Bottleneck

2004-08-04 Thread Pierre-Frdric Caillaud

The queries themselves are simple, normally drawing information from one
table with few conditions or in the most complex cases using joins on
two table or sub queries.   These behave very well and always have, the
problem is that these queries take place in rather large amounts due to
the dumb nature of the scripts themselves.
Hum, maybe this dumb thing is where to look at ?
	I'm no expert, but I have had the same situation with a very dump PHP  
application, namely osCommerce, which averaged about 140 (!) queries  
on a page !

	I added some traces to queries, and some logging, only to see that the  
stupid programmers did something like (pseudo code):

for id in id_list:
select stuff from database where id=id
Geee...
I replaced it by :
select stuff from database where id in (id_list)
	And this saved about 20 requests... The code was peppered by queries like  
that. In the end it went from 140 queries to about 20, which is still way  
too much IMHO, but I couldn't go lower without an extensive rewrite.

	If you have a script making many selects, it's worth grouping them, even  
using stored procedures.

	For instance using the classical tree in a table to store a tree of  
product categories :

create table categories
(
id serial primary key,
parent_id references categories(id),
etc
);
You basically have these choices in order to display the tree :
- select for parent_id=0 (root)
- for each element, select its children
- and so on
OR
	- make a stored procedure which does that. At least 3x faster and a lot  
less CPU overhead.

OR (if you have say 50 rows in the table which was my case)
- select the entire table and build your tree in the script
It was a little bit faster than the stored procedure.
	Could you give an example of your dumb scripts ? It's good to optimize a  
database, but it's even better to remove useless queries...






---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] The black art of postgresql.conf tweaking

2004-08-04 Thread Paul Serby
Can anyone give a good reference site/book for getting the most out of 
your postgres server.

All I can find is contradicting theories on how to work out your settings.
This is what I followed to setup our db server that serves our web 
applications.

http://www.phpbuilder.com/columns/smith20010821.php3?page=2
We have a Dell Poweredge with the following spec.
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
Physical Memory: 2077264 kB
Swap Memory: 2048244 kB
Apache on the Web server can take up to 300 connections and PHP is using 
 pg_pconnect

Postgres is set with the following.
max_connections = 300
shared_buffers = 38400
sort_mem = 12000
But Apache is still maxing out the non-super user connection limit.
The machine is under no load and I would like to up the max_connections 
but I would like to know more about what you need to consider before 
doing so.

The only other source I've found is this:
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html
But following its method my postgres server locks up straight away as it 
recommends setting max_connections to 16 for Web sites?

Is there a scientific method for optimizing postgres or is it all 
'finger in the air' and trial and error.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Tuning queries on large database

2004-08-04 Thread Valerie Schneider DSI/DEV
Hi,

I have some problem of performance on a PG database, and I don't
know how to improve. I Have two questions : one about the storage
of data, one about tuning queries. If possible !

My job is to compare Oracle and Postgres. All our operational databases
have been running under Oracle for about fifteen years. Now I try to replace
Oracle by Postgres.

I have a test platform under linux (Dell server, 4 Gb RAM, bi-processor,
Linux Red Hat 9 (2.4.20-31.9)) with 2 databases, 1 with Oracle
(V8i or V9i it's quite the same), 1 with PG (7.4.2). Both databases
have the same structure, same content, about 100 Gb each. I developped
some benches, representative of our use of databases. My problem
is that I have tables (relations) with more than 100 millions rows,
and each row has about 160 fields and an average size 256 bytes.

For Oracle I have a SGA size of 500 Mb.
For PG I have a postgresql.conf as :
max_connections = 1500
shared_buffers = 3
sort_mem = 5
effective_cache_size = 20
and default value for other parameters.

I have a table named data which looks like this :
bench= \d data
 Table public.data
   Column   |Type | Modifiers 
+-+---
 num_poste  | numeric(9,0)| not null
 dat| timestamp without time zone | not null
 datrecu| timestamp without time zone | not null
 rr1| numeric(5,1)| 
 qrr1   | numeric(2,0)|   ...
 ... all numeric fields
 ...
 Indexes:
pk_data primary key, btree (num_poste, dat)
i_data_dat btree (dat)

It contains 1000 different values of num_poste and for each one 
125000 different values of dat (1 row per hour, 15 years). 

I run a vacuum analyze of the table.

bench= select * from tailledb ;
 schema | relfilenode |  table   |   index|  reltuples  |   size   
+-+--++-+--
 public |   125615917 | data || 1.25113e+08 | 72312040
 public |   251139049 | data | i_data_dat | 1.25113e+08 |  2744400
 public |   250870177 | data | pk_data| 1.25113e+08 |  4395480

My first remark is that the table takes a lot of place on disk, about
70 Gb, instead of 35 Gb with oracle.
125 000 000 rows x 256 b = about 32 Gb. This calculation gives an idea
not so bad for oracle. What about for PG ? How data is stored ?


The different queries of the bench are simple queries (no join,
sub-query, ...) and are using indexes (I explained each one to
be sure) :
Q1 select_court : access to about 700 rows  : 1 num_poste and 1 month
(using PK : num_poste=p1  and dat between p2 and p3)
Q2 select_moy   : access to about 7000 rows : 10 num_poste and 1 month
(using PK : num_poste between p1 and p1+10 and dat between p2 and p3)
Q3 select_long  : about 250 000 rows: 2 num_poste 
(using PK : num_poste in (p1,p1+2))
Q4 select_tres_long : about 3 millions rows : 25 num_poste 
(using PK : num_poste between p1 and p1 + 25)

The result is that for short queries (Q1 and Q2) it runs in a few
seconds on both Oracle and PG. The difference becomes important with
Q3 : 8 seconds with oracle
 80 sec with PG
and too much with Q4 : 28s with oracle
   17m20s with PG !

Of course when I run 100 or 1000 parallel queries such as Q3 or Q4, 
it becomes a disaster !
I can't understand these results. The way to execute queries is the
same I think. I've read recommended articles on the PG site.
I tried with a table containing 30 millions rows, results are similar.

What can I do ?

Thanks for your help ! 


*Les points de vue exprimes sont strictement personnels et *
*  n'engagent pas la responsabilite de METEO-FRANCE.   *

* Valerie SCHNEIDER Tel : +33 (0)5 61 07 81 91 *
* METEO-FRANCE / DSI/DEVFax : +33 (0)5 61 07 81 09 *
* 42, avenue G. CoriolisEmail : [EMAIL PROTECTED] *
* 31057 TOULOUSE Cedex - FRANCE http://www.meteo.fr*



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] The black art of postgresql.conf tweaking

2004-08-04 Thread Jeff
On Aug 4, 2004, at 8:45 AM, Paul Serby wrote:
Apache on the Web server can take up to 300 connections and PHP is 
using  pg_pconnect

Postgres is set with the following.
max_connections = 300
shared_buffers = 38400
sort_mem = 12000
But Apache is still maxing out the non-super user connection limit.
Did you restart PG after making that change?
(you need to restart, reload won't change max_connections)
Also, you're sort_mem is likely too high (That is the amount of memory 
that can be used PER SORT) and you s hould back down on shared_buffers. 
(General consensus is don't go over 10k shared buffers)

Another thing you may want to try is using pgpool and regular 
pg_connect - this way you only have a pool of say, 32 connections to 
the DB that are shared among all apache instances.  This gets rid of 
the need to have hundreds of idle postgres'  sitting around.  
Connecting to pgpool is very fast. We use it in production here and it 
works wonderfully.  And it is 100% transparent to your application.

--
Jeff Trout [EMAIL PROTECTED]
http://www.jefftrout.com/
http://www.stuarthamm.net/
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] The black art of postgresql.conf tweaking

2004-08-04 Thread Michal Taborsky
Paul Serby wrote:
Apache on the Web server can take up to 300 connections and PHP is using 
 pg_pconnect

max_connections = 300
But Apache is still maxing out the non-super user connection limit.
Don't forget also that some connections are reserved for superusers 
(usually 2), so if you want 300 users, you need to set max_connections 
to 300 + superuser_reserved_connections.

--
Michal Taborsky
http://www.taborsky.cz
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Tuning queries on large database

2004-08-04 Thread Rod Taylor
On Wed, 2004-08-04 at 08:44, Valerie Schneider DSI/DEV wrote:
 Hi,
 
 I have some problem of performance on a PG database, and I don't
 know how to improve. I Have two questions : one about the storage
 of data, one about tuning queries. If possible !
 
 My job is to compare Oracle and Postgres. All our operational databases
 have been running under Oracle for about fifteen years. Now I try to replace
 Oracle by Postgres.

You may assume some additional hardware may be required -- this would be
purchased out of the Oracle License budget :)

 My first remark is that the table takes a lot of place on disk, about
 70 Gb, instead of 35 Gb with oracle.
 125 000 000 rows x 256 b = about 32 Gb. This calculation gives an idea
 not so bad for oracle. What about for PG ? How data is stored ?

This is due to the datatype you've selected. PostgreSQL does not convert
NUMERIC into a more appropriate integer format behind the scenes, nor
will it use the faster routines for the math when it is an integer.
Currently it makes the assumption that if you've asked for numeric
rather than integer or float that you are dealing with either large
numbers or require high precision math.

Changing most of your columns to integer + Check constraint (where
necessary) will give you a large speed boost and reduce disk
requirements a little.

 The different queries of the bench are simple queries (no join,
 sub-query, ...) and are using indexes (I explained each one to
 be sure) :

Care to send us the EXPLAIN ANALYZE output for each of the 4 queries
after you've improved the datatype selection?

-- 
Rod Taylor rbt [at] rbt [dot] ca

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/signature.asc


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] The black art of postgresql.conf tweaking

2004-08-04 Thread Janning Vygen
Am Mittwoch, 4. August 2004 14:45 schrieb Paul Serby:
 Apache on the Web server can take up to 300 connections and PHP is using
   pg_pconnect

 Postgres is set with the following.

 max_connections = 300
 shared_buffers = 38400
 sort_mem = 12000

 But Apache is still maxing out the non-super user connection limit.

for most websites 300 connections is far too much (imagine even 10 request per 
second for 10 hours a day ends up to 10.8 Mio pages a month)

but anyway: you should first focus on closing your http connection to the user 
as fast as possible. then you dont need so much concurrent connections which 
keep db connections open and uses memory.

I did the following:
- apache: keepalive off
- apache patch: lingerd (google for it)
- apache mod_gzip
- pg_pconnect

this keeps your http connection as short as possible, so the apache child is 
ready to serve the next client. 

Imagine 5 seconds of keepalive 1 second on lingering half-closed tcp 
connections and 4 more seconds for transport of uncompressed content.

in this scenario your apache child uses memory an your pooled db connection 
for 10 seconds while doing nothing!

in my experience apache in standard configuration can be the main bottleneck. 
and teh combination of keepalive off, lingerd and mod_gzip is GREAT and i 
didn't found much sites propagating a configuration like this.

kind regards,
janning

p.s: sorry for being slightly off topic and talking about apache but when it 
comes to performance it is always important to look at the complete system.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] The black art of postgresql.conf tweaking

2004-08-04 Thread Paul Thomas
On 04/08/2004 13:45 Paul Serby wrote:
Can anyone give a good reference site/book for getting the most out of 
your postgres server.

All I can find is contradicting theories on how to work out your 
settings.

This is what I followed to setup our db server that serves our web 
applications.

http://www.phpbuilder.com/columns/smith20010821.php3?page=2
We have a Dell Poweredge with the following spec.
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
Physical Memory: 2077264 kB
Swap Memory: 2048244 kB
Apache on the Web server can take up to 300 connections and PHP is 
using  pg_pconnect

Postgres is set with the following.
max_connections = 300
shared_buffers = 38400
Might be higher that neccessary. Some people reckon that there's no 
measurable performance going above ~10,000 buffers


sort_mem = 12000
Do you really need 12MB of sort memory? Remember that this is per 
connection so you could end up with 300x that being allocated in a worst 
case scenario.

But Apache is still maxing out the non-super user connection limit.
The machine is under no load and I would like to up the max_connections 
but I would like to know more about what you need to consider before 
doing so.
I can't think why you should be maxing out when under no load. Maybe you 
need to investigate this further.

The only other source I've found is this:
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html
But following its method my postgres server locks up straight away as it 
recommends setting max_connections to 16 for Web sites?
I think you've mis-interpreted that. She's talking about using persistent 
connections - i.e., connection pooling.

Is there a scientific method for optimizing postgres or is it all 
'finger in the air' and trial and error.
Posting more details of the queries which are giving the performance 
problems will enable people to help you. You're vacuum/analyzing regularly 
of course ;) People will want to know:

- PostgreSQL version
- hardware configuration (SCSI or IDE? RAID level?)
- table schemas
- queries together with EXPLAIN ANALYZE output
also output from utils like vmstat, top etc may be of use.
HTH
--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Tuning queries on large database

2004-08-04 Thread Pierre-Frdric Caillaud

not so bad for oracle. What about for PG ? How data is stored
	I agree with the datatype issue. Smallint, bigint, integer... add a  
constraint...

	Also the way order of the records in the database is very important. As  
you seem to have a very large static population in your table, you should  
insert it, ordered by your favourite selection index (looks like it's  
poste).

	Also, you have a lot of static data which pollutes your table. Why not  
create two tables, one for the current year, and one for all the past  
years. Use a view to present a merged view.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Tuning queries on large database

2004-08-04 Thread Pierre-Frdric Caillaud
	You often make sums. Why not use separate tables to cache these sums by  
month, by poste, by whatever ?

Rule on insert on the big table updates the cache tables.
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


FW: [PERFORM] Tuning queries on large database

2004-08-04 Thread Merlin Moncure
[forwarded to performance] 
 The result is that for short queries (Q1 and Q2) it runs in a few
 seconds on both Oracle and PG. The difference becomes important with
 Q3 : 8 seconds with oracle
  80 sec with PG
 and too much with Q4 : 28s with oracle
17m20s with PG !
 
 Of course when I run 100 or 1000 parallel queries such as Q3 or Q4,
 it becomes a disaster !
 I can't understand these results. The way to execute queries is the
 same I think. I've read recommended articles on the PG site.
 I tried with a table containing 30 millions rows, results are similar.


I don't trust the Oracle #s.  Lets look at Q4: returns 3 million rows.
Using your #s of 160 fields and 256 bytes, your are asking for a result
set of 160 * 256 * 3M  = 12 GB!  This data has to be gathered by the
disk, assembled, and sent over the network.

I don't know Oracle, but it probably has some 'smart' result set that
uses a cursor behind the scenes to do the fetching.

With a 3M row result set, you need to strongly consider using cursors.
Try experimenting with the same query (Q4), declared as a cursor, and
fetch the data in 10k blocks in a loop (fetch 1), and watch the #s
fly.

Merlin


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance Bottleneck

2004-08-04 Thread Martin Foster
Gaetano Mendola wrote:
Martin Foster wrote:
Gaetano Mendola wrote:
Martin Foster wrote:
I run a Perl/CGI driven website that makes extensive use of 
PostgreSQL (7.4.3) for everything from user information to 
formatting and display of specific sections of the site.   The 
server itself, is a dual processor AMD Opteron 1.4Ghz w/ 2GB Ram and 
2 x 120GB hard drives mirrored for redundancy running under FreeBSD 
5.2.1 (AMD64).

Recently loads on the site have increased during peak hours to the 
point of showing considerable loss in performance.This can be 
observed when connections move from the 120 concurrent connections 
to PostgreSQL to roughly 175 or more. Essentially, the machine 
seems to struggle to keep up with continual requests and slows down 
respectively as resources are tied down.

Code changes have been made to the scripts to essentially back off 
in high load working environments which have worked to an extent. 
However, as loads continue to increase the database itself is not 
taking well to the increased traffic taking place.

Having taken a look at 'Tuning PostgreSQL for Performance' 
(http://www.varlena.com/GeneralBits/Tidbits/perf.html) using it as 
best I could in order to set my settings.However, even with 
statistics disabled and ever setting tweaked things still consider 
to deteriorate.

Is there anything anyone can recommend in order to give the system a 
necessary speed boost?   It would seem to me that a modest dataset 
of roughly a Gig combined with that type of hardware should be able 
to handle substantially more load then what it is.  Can anyone 
provide me with clues as where to pursue?Would disabling 'fsync' 
provide more performance if I choose that information may be lost in 
case of a crash?

If anyone needs access to logs, settings et cetera.   Please ask, I 
simply wish to test the waters first on what is needed. Thanks!


Tell us about your tipical queries, show us your configuration file.
The access are only in read only mode or do you have concurrent writers
and readers ? During peak hours your processors are tied to 100% ?
What say the vmstat and the iostat ?
May be you are not using indexes some where, or may be yes but the
planner is not using it... In two words we needs other informations
in order to help you.

Regards
Gaetano Mendola

I included all the files in attachments, which will hopefully cut down 
on any replied to Emails.As for things like connection pooling, 
the web server makes use of Apache::DBI to pool the connections for 
the Perl scripts being driven on that server.For the sake of being 
thorough, a quick 'apachectl status' was thrown in when the database 
was under a good load.

Let start from your postgres configuration:
shared_buffers = 8192 This is really too small for your 
configuration
sort_mem = 2048

wal_buffers = 128 This is really too small for your configuration
effective_cache_size = 16000
change this values in:
shared_buffers = 5
sort_mem = 16084
wal_buffers = 1500
effective_cache_size = 32000
to bump up the shm usage you have to configure your OS in order to be
allowed to use that ammount of SHM.
This are the numbers that I feel good for your HW, the second step now is
analyze your queries
The queries themselves are simple, normally drawing information from 
one table with few conditions or in the most complex cases using joins 
on two table or sub queries.   These behave very well and always have, 
the problem is that these queries take place in rather large amounts 
due to the dumb nature of the scripts themselves.

Show us the explain analyze on that queries, how many rows the tables are
containing, the table schema could be also usefull.

regards
Gaetano Mendola
I will look into moving up those values and seeing how they interact 
with the system once I get back from work.Since it was requested, I 
have a visual representation of an older schema, one that was used under 
MySQL.  Note that all of the timestamps are now properly set to 
LOCALTIME on PostgreSQL.

http://prdownloads.sourceforge.net/ethereal-realms/ethereal-3_0_0.png?download
The amount of rows for tables of note are as follows:
 Puppeteer 1606
 Puppet33176
 Realm 83
 Post  36156
 Audit 61961
The post table is continually cleared of old information since the 
nature of the information is time very critical and archiving would only 
hinder performance.As a result, this will vary wildly based on time 
of day since users (Puppeteers) tend to post more during peak hours.

NOTE:   The scripts make use of different schema's with the same
information in order to virtualize the script in order
to support more then one site on the same hardware.
On a side note, this would be a normal post-authentication session once 
in realm for getting new posts:
 * Script is executed and schema is determined through stored procedure;
 * Formatting information is fetched from Tag and RealmDesign as needed;