Re: [PERFORM] Re: New server to improve performance on our large and busy DB - advice? (v2)

2010-01-20 Thread Carlo Stonebanks

* A database that is of small to medium size (5 - 10 GB)?
* Around 10 clients that perform constant write operations to the database 
(UPDATE/INSERT)

* Around 10 clients that occasionally read from the database
* Around 6000 tables in your database
* A problem with tuning it all
* Migration to new hardware and/or OS

Is this all correct?


Actually, the tablespace is very large, over 500GB. However, the actualy 
production DB is 200GB.


First thing that is noticeable is that you seem to have way too few drives 
in the server - not because of disk space required but because of speed. 
You didn't say what type of drives you have and you didn't say what you 
would consider desirable performance levels, but off hand (because of the 
10 clients perform constant writes part) you will probably want at least 
2x-4x more drives.



With only 4 drives, RAID 10 is the only thing usable here.


What would be the optimum RAID level and number of disks?


 2) Which Windows OS would you recommend? (currently 2008 x64 Server)

Would not recommend Windows OS.


We may be stuck as my client is only considering Red Hat Linux (still 
waiting to find out which version). If it turns out that this limitatt 
doesn't give better than a marginal improvement, then there is no incentive 
to create more complications in what is basically a Windows shop (although 
the project manager is a Linux advocate).


Most importantly, you didn't say what you would consider desirable 
performance. The hardware and the setup you described will work, but not 
necessarily fast enough.


Once again, it seems as though we are down to the number of drives...

Have you tried decreasing random_page_cost in postgresql.conf? Or setting 
(as a last resort) enable_seqscan = off?


In critical code sections, we do - we have stored procedures and code 
segments which save the current enable_seqscan value, set it to off (local 
to the transaction), then restore it after the code has run. Our current 
planner cost values are all default. Is this what you would choose for a 
Intel Core 2 Quads Quad with 48 GB RAM?


# - Planner Cost Constants -
#seq_page_cost = 1.0   # measured on an arbitrary scale
#random_page_cost = 4.0   # same scale as above
#cpu_tuple_cost = 0.01   # same scale as above
#cpu_index_tuple_cost = 0.005  # same scale as above
#cpu_operator_cost = 0.0025  # same scale as above
#effective_cache_size = 128MB

Thanks for the help,

Carlo 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: New server to improve performance on our large and busy DB - advice? (v2)

2010-01-15 Thread Ivan Voras

hi,

You wrote a lot of information here so let's confirm in a nutshell what 
you have and what you are looking for:


* A database that is of small to medium size (5 - 10 GB)?
* Around 10 clients that perform constant write operations to the 
database (UPDATE/INSERT)

* Around 10 clients that occasionally read from the database
* Around 6000 tables in your database
* A problem with tuning it all
* Migration to new hardware and/or OS

Is this all correct?

First thing that is noticeable is that you seem to have way too few 
drives in the server - not because of disk space required but because of 
speed. You didn't say what type of drives you have and you didn't say 
what you would consider desirable performance levels, but off hand 
(because of the 10 clients perform constant writes part) you will 
probably want at least 2x-4x more drives.


 1) Which RAID level would you recommend

With only 4 drives, RAID 10 is the only thing usable here.

 2) Which Windows OS would you recommend? (currently 2008 x64 Server)

Would not recommend Windows OS.

 3) If we were to port to a *NIX flavour, which would you recommend? 
(which

 support trouble-free PG builds/makes please!)

Practically any. I'm biased for FreeBSD, a nice and supported version of 
Linux will probably be fine.


 4) Is this the right PG version for our needs?

If you are starting from scratch on a new server, go for the newest 
version you can get - 8.4.2 in this case.


Most importantly, you didn't say what you would consider desirable 
performance. The hardware and the setup you described will work, but not 
necessarily fast enough.


 . So far, we have never seen a situation where a seq scan has improved
 performance, which I would attribute to the size of the tables

... and to the small number of drives you are using.

 . We believe our requirements are exceptional, and we would benefit
 immensely from setting up the PG planner to always favour 
index-oriented decisions


Have you tried decreasing random_page_cost in postgresql.conf? Or 
setting (as a last resort) enable_seqscan = off?



Carlo Stonebanks wrote:
My client just informed me that new hardware is available for our DB 
server.


. Intel Core 2 Quads Quad
. 48 GB RAM
. 4 Disk RAID drive (RAID level TBD)

I have put the ugly details of what we do with our DB below, as well as the
postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB
with very large tables and the server is always busy serving a constant
stream of single-row UPDATEs and INSERTs from parallel automated processes.

There are less than 10 users, as the server is devoted to the KB production
system.

My questions:

1) Which RAID level would you recommend
2) Which Windows OS would you recommend? (currently 2008 x64 Server)
3) If we were to port to a *NIX flavour, which would you recommend? (which
support trouble-free PG builds/makes please!)
4) Is this the right PG version for our needs?

Thanks,

Carlo

The details of our use:

. The DB hosts is a data warehouse and a knowledgebase (KB) tracking the
professional information of 1.3M individuals.
. The KB tables related to these 130M individuals are naturally also large
. The DB is in a perpetual state of serving TCL-scripted Extract, Transform
and Load (ETL) processes
. These ETL processes typically run 10 at-a-time (i.e. in parallel)
. We would like to run more, but the server appears to be the bottleneck
. The ETL write processes are 99% single row UPDATEs or INSERTs.
. There are few, if any DELETEs
. The ETL source data are import tables
. The import tables are permanently kept in the data warehouse so that we
can trace the original source of any information.
. There are 6000+ and counting
. The import tables number from dozens to hundreds of thousands of rows.
They rarely require more than a pkey index.
. Linking the KB to the source import date requires an audit table of 
500M

rows, and counting.
. The size of the audit table makes it very difficult to manage, especially
if we need to modify the design.
. Because we query the audit table different ways to audit the ETL 
processes

decisions, almost every column in the audit table is indexed.
. The maximum number of physical users is 10 and these users RARELY perform
any kind of write
. By contrast, the 10+ ETL processes are writing constantly
. We find that internal stats drift, for whatever reason, causing row seq
scans instead of index scans.
. So far, we have never seen a situation where a seq scan has improved
performance, which I would attribute to the size of the tables
. We believe our requirements are exceptional, and we would benefit
immensely from setting up the PG planner to always favour index-oriented
decisions - which seems to contradict everything that PG advice suggests as
best practice.

Current non-default conf settings are:

autovacuum = on
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 250
autovacuum_naptime = 1min
autovacuum_vacuum_scale_factor = 0.2

Re: [PERFORM] Re: New server to improve performance on our large and busy DB - advice? (v2)

2010-01-15 Thread Robert Haas
On Fri, Jan 15, 2010 at 8:43 AM, Ivan Voras ivo...@freebsd.org wrote:
 Have you tried decreasing random_page_cost in postgresql.conf? Or setting
 (as a last resort) enable_seqscan = off?

If you need to set enable_seqscan to off to get the planner to use
your index, the chances that that index are actually going to improve
performance are extremely poor.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: New server to improve performance on our large and busy DB - advice? (v2)

2010-01-15 Thread Ing. Marcos L. Ortiz Valmaseda

El 15/01/2010 14:43, Ivan Voras escribió:

hi,

You wrote a lot of information here so let's confirm in a nutshell 
what you have and what you are looking for:


* A database that is of small to medium size (5 - 10 GB)?
* Around 10 clients that perform constant write operations to the 
database (UPDATE/INSERT)

* Around 10 clients that occasionally read from the database
* Around 6000 tables in your database
* A problem with tuning it all
* Migration to new hardware and/or OS

Is this all correct?

First thing that is noticeable is that you seem to have way too few 
drives in the server - not because of disk space required but because 
of speed. You didn't say what type of drives you have and you didn't 
say what you would consider desirable performance levels, but off hand 
(because of the 10 clients perform constant writes part) you will 
probably want at least 2x-4x more drives.


 1) Which RAID level would you recommend

With only 4 drives, RAID 10 is the only thing usable here.

 2) Which Windows OS would you recommend? (currently 2008 x64 Server)

Would not recommend Windows OS.

 3) If we were to port to a *NIX flavour, which would you recommend? 
(which

 support trouble-free PG builds/makes please!)

Practically any. I'm biased for FreeBSD, a nice and supported version 
of Linux will probably be fine.


 4) Is this the right PG version for our needs?

If you are starting from scratch on a new server, go for the newest 
version you can get - 8.4.2 in this case.


Most importantly, you didn't say what you would consider desirable 
performance. The hardware and the setup you described will work, but 
not necessarily fast enough.


 . So far, we have never seen a situation where a seq scan has improved
 performance, which I would attribute to the size of the tables

... and to the small number of drives you are using.

 . We believe our requirements are exceptional, and we would benefit
 immensely from setting up the PG planner to always favour 
index-oriented decisions


Have you tried decreasing random_page_cost in postgresql.conf? Or 
setting (as a last resort) enable_seqscan = off?



Carlo Stonebanks wrote:
My client just informed me that new hardware is available for our DB 
server.


. Intel Core 2 Quads Quad
. 48 GB RAM
. 4 Disk RAID drive (RAID level TBD)

I have put the ugly details of what we do with our DB below, as well 
as the

postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB
with very large tables and the server is always busy serving a constant
stream of single-row UPDATEs and INSERTs from parallel automated 
processes.


There are less than 10 users, as the server is devoted to the KB 
production

system.

My questions:

1) Which RAID level would you recommend
2) Which Windows OS would you recommend? (currently 2008 x64 Server)
3) If we were to port to a *NIX flavour, which would you recommend? 
(which

support trouble-free PG builds/makes please!)
4) Is this the right PG version for our needs?

Thanks,

Carlo

The details of our use:

. The DB hosts is a data warehouse and a knowledgebase (KB) tracking the
professional information of 1.3M individuals.
. The KB tables related to these 130M individuals are naturally also 
large
. The DB is in a perpetual state of serving TCL-scripted Extract, 
Transform

and Load (ETL) processes
. These ETL processes typically run 10 at-a-time (i.e. in parallel)
. We would like to run more, but the server appears to be the bottleneck
. The ETL write processes are 99% single row UPDATEs or INSERTs.
. There are few, if any DELETEs
. The ETL source data are import tables
. The import tables are permanently kept in the data warehouse so 
that we

can trace the original source of any information.
. There are 6000+ and counting
. The import tables number from dozens to hundreds of thousands of rows.
They rarely require more than a pkey index.
. Linking the KB to the source import date requires an audit table 
of 500M

rows, and counting.
. The size of the audit table makes it very difficult to manage, 
especially

if we need to modify the design.
. Because we query the audit table different ways to audit the ETL 
processes

decisions, almost every column in the audit table is indexed.
. The maximum number of physical users is 10 and these users RARELY 
perform

any kind of write
. By contrast, the 10+ ETL processes are writing constantly
. We find that internal stats drift, for whatever reason, causing row 
seq

scans instead of index scans.
. So far, we have never seen a situation where a seq scan has improved
performance, which I would attribute to the size of the tables
. We believe our requirements are exceptional, and we would benefit
immensely from setting up the PG planner to always favour index-oriented
decisions - which seems to contradict everything that PG advice 
suggests as

best practice.

Current non-default conf settings are:

autovacuum = on
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 250

Re: [PERFORM] Re: New server to improve performance on our large and busy DB - advice? (v2)

2010-01-15 Thread Pierre Frédéric Caillau d



  2) Which Windows OS would you recommend? (currently 2008 x64 Server)

Would not recommend Windows OS.


	BTW, I'd be interested to know the NTFS fragmentation stats of your  
database file.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance